Bridging across bridges: engaging the geoscience research community in standards development (Part 2)

This is Part 2 of a multipart series. Click here for Part 1 or Part 3

First, I’d like to mention some comments to Part 1, in which I posed a question: “A small but committed number of academic researchers are helping develop OGC standards, but the vast majority are not. Why do some get into it, and why don't more?”

One strong point from comments (some off-blog) was that domain scientists (i.e., non-computer scientists) should not have to engage in information technology (IT) or data management issues, much less in data standards development. Rather, they should be involved in an advisory capacity, leaving the informatics to informaticians. I completely agree.  Nevertheless, I’ve come across a few rare domain scientists who stand in both the science and IT worlds, and do it well enough that they make things happen on a global scale.  These are outliers, and should not be taken as “typical scientists”. Further, this should not be taken as a criticism of less eclectic scientists. But I’m curious if there are ways to nudge the system, that would create more opportunities and rewards for domain scientists to work with IT and standards folks.

The thing is, “people tend to do what they really want to do”, as a wise supervisor once told me, when I was trying to explain why I wasn’t getting done the things that were his priorities. He also recognized that he got the best work from employees who were tasked to do what they really wanted to do.  I completely agree with this philosophy, so I’m not advocating that domain scientists try becoming good at something they don’t want to do. There’s really an ecosystem of science and technology tasks and people, and we depend on different people wanting to do different tasks.

What I do want to see is that good data management and IT practices become easier and more natural for geoscientists to follow, in fact making it easier to focus on their science, without having to focus so much on the technology.  I want to look at ways to improve the technology of science without distracting the scientists with technology.  Then scientists and researchers will get to do more of what they really want to do.

Examples of where this is starting to happen are the integrated tools/data sets used by the climate/Met research community. Data comes in netCDF format, is processed using CDO or similar, visualized using Ferret or other tools. Same for the Esri and HydroDesktop environments used by many environmental science researchers. Such environments handle the standards-based side of things, allowing users to conduct data search and analysis in a harmonized way. You use what you get, with greatly simplified format and coordinate conversions. But you also don’t explore behind this horizon, most often because you don’t know what is/would be possible. This approach could be taken farther, such as to incorporate the collection and validation of provenance and other metadata earlier and in more context-sensitive ways in users’ workflows.

But integrated environments also don’t cover all use cases. Note that in order to publish the results of scientific research, the underlying data for the research must often be cited if not included in the publication; standardized data management can assist with standardized data citation. This has been an active discussion area in ESIP and RDA but not in the OGC.  (I won’t get into the debate over distinctions between data, database, data sets, and data products; see Joe Hourcle’s excellent and humorous talk on this at Ignite@AGU a couple years back.) This topic addresses an important part of science: reproducibility of data for subsequent verification and reanalysis. The science community would like for citations to enable linking to the cited data set, whatever that takes. New discussions are taking place about citation of highly dynamic data sets. Another area is semantics, which cuts across all communities.

I also want to emphasize that science, technology, and standards development are interdependent and continually evolving.  Case 1: I’ve sat with clients while prototyping a user interface or web page design for them, and gotten this reaction: “You can do that?? Hmm, can you do this [insert wish-list item] too?”  Often the answer would be “why not?”  Standards for data exchange, model integration and visualization make that much more frequent and productive. Case 2: Every now and then something like the Internet or even “just” the iPhone comes along and shakes up the whole fabric of society, technology & science. Various standards have to catch up, and new standards and even whole communities appear. Case 3: As science and technology for satellite-based Earth observation and analysis improve, the complexity and volumes of data increase exponentially, requiring continual evolution in the standards and tools needed to support them.

So if science, technology and standards are all interdependent, and standards tend to catch up as needed, what’s the problem? A big problem is that data standards development generally is not supported or rewarded in academia as it is in industry and government agencies. And that means that academic use of standards is limited to what already works for them, with little input to influence the standards evolution. So the standards community is actually missing out on a huge contribution that could conceivably come from academia, and academia is missing out on the rewards of influencing standards to help them do their work more efficiently and transparently.  

I’m not saying academia is completely missing from the OGC; there are over a hundred universities with one or more professors, researchers or students registered on the OGC portal. The majority of these universities are in Europe. The US has only about 30 universities with OGC membership, and very few of these are active in standards development. I would contend that most US university members of OGC are there to learn and master the OGC standards, rather than to help construct and advance the standards to support geoscience research. We’re also not teaching OGC standards widely in academia.    

But NSF could help here, and EarthCube might be the key.

Enter EarthCube:  The US National Science Foundation (NSF) EarthCube program is a long-term initiative to identify and nurture opportunities to make geoscience data, models, workflows, visualization, and decision support available and usable across all geoscience domains.  This is an ambitious undertaking.  Lucky for me, Anna Kelbert just published an excellent overview of the motivation and emerging structure for EarthCube, so I don’t have to repeat all that here. I’ll just say there are now about 30 funded projects in varying stages of completion, and more on the way.  These are in 3 categories: Research Coordination Networks (outreach to potential users), Conceptual Designs (core architecture frameworks), and Building Blocks (technology components). It is intended to be community driven and community governed. 

How it could happen:  In the next segment, I'll propose a way to leverage EarthCube to loosely-couple the NSF research agenda with the key IT standards development agendas. 

Thanks to Joe Hourcle, Ingo Simonis, Scott Simmons and Carl Reed for contributions to this segment. 

The thoughts and opinions expressed here are those of the contributor alone, and do not necessarily reflect the views of EarthCube's governance elements, funding agency, or staff. 

<<  Part 1 |  Part 2  |  Part 3  >>