Bridging across bridges: engaging the geoscience research community in standards development
This is Part 1 of a multipart series. Click here for Part 2 or Part 3.
A number of academic researchers are committed to helping develop OGC standards, but the vast majority are not. Why do some get into it, and why don't more? For the past several years I've been looking for ways to stimulate academic involvement in geospatial standards development. I'm starting to see some potential, but the cultural & institutional barriers still dominate.
Btw, I'll be using "involved" and "committed" in the sense of the old line about ham and eggs: the chicken's involved, but the pig's committed (you can google that phrase for an interesting history).
Basically, geoscience researchers are committed to mastering their science, but generally not to mastering data management, standards development, or other "translational" matters for making their research results accessible across domains. Such efforts present whole new learning curves of concepts and processes that have much less potential to advance a scientist's career. The measure of a scientist's career seems to be an index based on the number of written scientific publications and the number of citations of those publications in other publications. This is an arbitrary measure with a number of issues, but widely followed.
Even so, there's a growing, bubbling, emergent force in the sciences that's making interdisciplinary research more important, achievable, and professionally rewarding. The concepts and problems for understanding and modeling the Earth's climate, weather, water cycles, carbon flux, and many other interrelated subjects, are requiring increasing cooperation across subject domains. We're seeing major, long-term initiatives in the US, Europe, Australia and elsewhere now, starting to poke holes in the walls between subject domains, and between science domains and cyberinfrastructure development. I want to review a selective history first for context, then show how the pieces are starting to fit together.
Going back to the early 1990s, the World Wide Web Consortium (W3C), OGC and ISO TC211 started their work about the same time, with distinct but complementary missions to improve information sharing across nations, subject domains, and industries. These are just a few of a veritable ecosystem of formal, international standards development organizations (SDOs) and industry consortia that have gradually strengthened their alliances with each other over the years to better accomplish their goals (e.g., see recent article about OGC & W3C). One of these consortia, the Federation of Earth Science Information Partners (ESIP Federation) which started in the US in the late 1990s, has been focused on collaborative research for Earth and space sciences. An international, all-sciences consortium (not just geo) emerged in 2013, the Research Data Alliance (RDA). Initial sponsors for RDA were the Australian government, the European Commission, and the US National Science Foundation (NSF); NIST and Japan are now getting involved. More about OGC, ESIP and RDA later.
Internetworking among the geosciences goes back much farther, to the World Data Centres and Federation of Astronomical and Geophysical data analysis Services established by the International Council of Science (ICSU) to manage data generated by the International Geophysical Year (1957–1958). It became clear after the International Polar Year (2007–2008) that these bodies were not able to respond fully to modern data needs and they were thus disbanded by the ICSU General assembly in 2008 and replaced by the ICSU World Data System in 2009.
Another international networking initiative, started in 2005 by the Group on Earth Observations (GEO), is the Global Earth Observation System of Systems (GEOSS). This is being developed to provide tools for data discovery, access, and decision support, with tasks organized by societal benefit areas (SBAs): climate, weather, water, agriculture, energy, health, biodiversity, ecosystems, and disasters. Through the development of a web-based broker that distributes users' queries across dozens of Earth observation catalogues hosted around the world (e.g., Global Change Master Directory, Pangaea, and many others), you can now reach over 14 million data collections through GEOSS; this number is growing rapidly as more agencies' catalogues are registered, week by week. This may seem paltry compared with a Google search, but we're talking about qualified data-searching with geographic and temporal filters, as well as the ability to select specific authoritative international data catalogues.
This is just a small sampling of efforts around the world to publish and internetwork Earth observations and geoscience data. But the consortia and initiatives just mentioned have emerged as key drivers in yet newer initiatives seeking to "bridge the bridges". The more some folks find out about the world, the more other folks want to be able to relate those findings with someone else's. And as computational and network technology have advanced, it's become both easier and harder: easier to find lots of data, but harder to know what it means, and how to relate all the pieces.
This is where standards come in: without standard vocabularies, taxonomies, metadata descriptions, and interfaces for discovery and access to data, a researcher has a daunting job just putting data from disparate sources into a common framework for analysis. No wonder geoscience researchers don't have time to mess with standards development; they're too busy finding, collecting, calibrating, converting and reformatting data from multiple sources, so they can run their intended geophysical models. <wink>
Efforts in driving standards to better support geoscience research have varied globally, with three big players emerging: Europe, Australia and the US. Geoscience research in Europe is largely funded by the European Commission through initiatives like the Framework Programme (FP6, FP7) and now Horizon 2020. These programs have helped implement a European Directive called INSPIRE (Infrastructure for Spatial Information in the European Community). This actually defines and mandates use of international standards for geospatial and geoscience information. Much of the core support for GEOSS has also resulted from FP7 projects, some of which are still underway, and more to come from Horizon 2020. Geoscience Australia and CSIRO are the main drivers for the Australian research program.
I can't say much more about European or Australian research programs, as I've been most involved in the US. But the US situation is what I really want to talk about now. We have some catch-up to do, and it's starting to happen.
Next: About the NSF EarthCube initiative and its potential relation to standards development.
Thanks to Mark Parsons and Ingo Simonis for contributions to this segment.
The thoughts and opinions expressed here are those of the contributor alone, and do not necessarily reflect the views of EarthCube's governance elements, funding agency, or staff.
Comments
Mark Parsons (not verified)
Wednesday, 2015-01-07
Permalink
Thanks for a nice overview,
Thanks for a nice overview, David.
Bruce Caron (not verified)
Thursday, 2015-01-08
Permalink
Thanks for your insights.
Thanks for your insights. Looking forward to working with you to articulate how standards can enable open geoscience!
Carl N Reed III (not verified)
Saturday, 2015-01-10
Permalink
Thanks for writing this blog!
Thanks for writing this blog! Very well written and informative. Looking forward to part 2. If only we can also get this message out to the US University geography community! I am amazed how few geography/GIS programs address issues related to geodata sharing within and across information communities. Too many GIS programs focus on tools and not solutions and problem solving. And forget modelling. Semantics for cross community interoperability? No way. I feel a blog coming on :-)
Joe Hourclé (not verified)
Saturday, 2015-01-10
Permalink
(reposting a response sent to
(reposting a response sent to the AGU-ESSI mailing list ... in two parts) And my counter: Most researchers shouldn't get directly involved in standards development. They should be consulted to determine the requirements, and given opportunity to comment, but unless they're actively involved in writing tools to generate or consume the standards, their time is better spent doing research. Leave the standards to the informatics people who might understand what the implications are of selecting a data model, interchange serialization, date format, and whatever other nit-picking details that will make the scientist's eyes roll back in their head just as fast as mine to when I go to helioseismology lectures. Our 'governanace model' for the VSO is that we have two groups -- the implementors, and the science steering committee. We do have a solar physicist (with a comp.sci degree) in the implementor group to answer the simple questions, and for the others we refer to the science steering committee. The steering committee has shot down a few ideas that I still think were good*, and sometimes we just implement things without telling them** and as it doesn't actually affect them, they don't notice. -Joe
* Like storing the spectral range to present as results, but normalize all of the values for searching by log of meters so we don't have to deal w/ unit conversions when searching
** I store everything in Angstrom for searching on some of the systems, even when it was in GHz or keV.
David Arctur
Saturday, 2015-01-10
Permalink
Joe, you are right on about
Joe, you are right on about most domain scientists. Occasionally one finds an inspiring person who is not just good at his/her domain science, but appreciates and takes care to understand enough IT & data management to come up with science that works in the real world on a large scale. It's really an ecosystem; we don't all have to be excellent at everything, or even very many things.
IMO, we can't and don't need to engage the “typical researcher” in data management. But the research funding agencies can make it easier by supporting initiatives that build the bridges, so data mgmt and interoperability become more and more transparent to the typical researcher. We need to make it better for those few inspiring folks who can bridge science and cyber work.
This is getting into the substance of Part 2 for this blog ...
Joe Hourclé (not verified)
Monday, 2015-01-12
Permalink
(part 2 from the AGU-ESSI
(part 2 from the AGU-ESSI mailing list) (edited for brevity -dka)
We actually have plenty of scientists involved. SPASE, FITS and WCS (World Coordinate System, not the OGC one) are likely 90% domain scientists ... which is part of why I know about some of the issues when the scientists develop the standards.
... they were focusing on defining data models & defining metadata fields ... I cared about actually trying to implement it, so tried to get them to define date interchange standards. They agreed on XML and ISO dates, but I lost the argument on handling enumerations as XML elements. (so they're strings, making it difficult for me to extend SPASE by overlaying the specific info that we need for VSO ... and so we got no benefit out of it, and have never actually used it as it only added to our workload).
In my opinion, a large part of the problem is that the IT folks are typically seen as 'science support', and more like secretaries than people who contribute important skills. As such, it's much harder for us to get approval to travel to scientific meetings where we can talk with other people who are implementing systems.
... I shouldn't have to fight a PI team for months, calling into their telecon week after week trying to get them to put checksums in their data. we're using RAID6, we'd have to lose 2 disks for it to be an issue (completely ignoring the mirror sites and/or their tape archive) ... and thus the only reason they have checksums is because their power center had a massive power outage 2 weeks before launch, and around 1/3 of the arrays lost 2 disks.
I'm going to stick with the 'get the requirements from the scientists, and not tell them how you implemented it' model ... it's so much more efficient.
David Arctur
Tuesday, 2015-01-13
Permalink
Joe, this reminds me of
Joe, this reminds me of something from How Google Works, between a manager and the founder Larry Page:
"A few months later, Jonathan presented Larry with a product plan that was a manifestation of the gate-based approach at its finest. There were milestones and approvals, priorities, and a two-year plan of what products Google would release and when. It was a masterpiece of textbook thinking. All that remained was for him to receive a rousing round of applause and a pat on the back. Sadly, this was not to be: Larry hated it. "Have you ever seen a scheduled plan that the team beat?" he asked. Um, no. "Have your teams ever delivered better products than what was in the plan?" No again. "Then what's the point of the plan? It's holding us back. There must be a better way. Just go talk to the engineers."
"Talking to the engineers" isn't a substitute for a plan; I don't see how we'll get away from planning & management (certainly not for NSF grants). But "talking to the engineers" also reminds me of the story about Toyota buying a failing manufacturing plant in Fremont California from GM. This auto plant was notorious for high defect rates. Toyota was made by the union to hire all the same workers. Within just a couple years, they were getting close to zero defects. It was done by listening to the workers on the production line, when they said something wasn't working, and how things could get better. The most remarkable thing about this story wasn't that Toyota turned the plant around, but that GM didn't learn from the experience. They kept on doing manufacturing the same top-down way.
So how do we get this form of crowdsourcing (listening to the engineers, IT, etc) recognized by research leadership? How can EarthCube help with this?
Rick Hooper (not verified)
Monday, 2015-01-12
Permalink
David, Thanks for this entry
David, Thanks for this entry and I look forward to the next part.In response to the other comments about engaging academic scientists, I believe that one responsibility the academic community has is to TEACH its students about using standards and data management techniques even if the scientists themselves are not interested in contributing to their development. The informatics community and those rare individuals that bridge the informatics and domain science divide need to help develop best practices to be shared with domain scientists. This is how we will instantiate these ideas into the science community.