Big Data vs SDI? It's not an either/or.
There's a debate in the geo blogosphere about "Big Data" versus spatial data infrastructure (SDI), that is, deriving information from searches of unstructured data versus deriving information from structured data.
Thierry G's blog post last week, "From Lego to Play-Doh: I plead guilty at the altar of Big Data" provides a fine and amusing summary of the argument for big data. The opposing argument is one that spatial data coordinators, data managers and standards organizations have been making for twenty years -- we should all be working with others to develop and use data standards, metadata standards, encoding standards and geospatial software interface standards, including catalog interfaces.
One reason the debate is important is that geospatial semantics is emerging as an important part of the Semantic Web, and the Semantic Web, along with linked data, is going to make search engines even more powerful and useful.
Anyone interested in this debate would have been torn apart trying to decide which sessions to attend at the recent OGC Technical Committee meeting in Boulder. There were discussions about architectures that use both OGC Web Services and linked data; discussions about where data sharing communities should draw the line between sufficiently and insufficiently harmonized ontologies; and discussions about GeoJSON, REST, GeoSPARQL, and whether to develop a specification for writing RESTful specifications. Big data vs. structured data figured in discussions about augmented reality, urban modeling, sensor discovery, and crowdsourcing.
As Carl Reed, the OGC's CTO, says, "It's not an either-or. There are requirements for both, whether they're used independently or in blended approaches. Scientists, researchers, military analysts and others will continue needing to assess the resolution, provenance, accuracy, and other measures of spatial/temporal data quality and fitness for use. At the same time, they, along with many others (business intelligence, GEOINT, social networking, etc.), are grateful for the gooey Big Data tar ball and innovative tools to make inferences and discover trends."
Comments
Thierry_G (not verified)
Monday, 2011-10-24
Permalink
I agree it's not an either/or
I agree it's not an either/or - life never is. Standards like OGC are important to enable data sharing between systems, much like travellers need power adapters to plug their appliances into foreign sockets. And of course you have adopted KML as a standard, which should ensure that also lay users will share spatial data with a semblance of intelligence. I do realise there are applications where structure and accuracy are very important (when I worked in the oil industry a positional error of just 20 metres could cause millions of dollars in losses). The point I was making is that anyone who, like me, grew up in the traditional geodata industry will need to let go of some of our desires to structure and standardise everything, or we will miss the boat in the new world of big data. But of course both worlds have got something to bring to the table, so I think we are all agreed on that.
Hervé Caumont (not verified)
Saturday, 2011-10-29
Permalink
Requirements for both would
Requirements for both would nevertheless advocate for a refoundation of the DCP architecture model, as a recognition for new tools and frameworks.
Such a refoundation would simply empower the business to business web services, as we are used to implement for software and data models interoperability, while opening the door to this acknowledged new 'social web' computing era for geospatial domain practitioners.
A refoundation just because one fundamental aspect of big data analytics is the computing infrastructure.
This computing infrastructure is just nothing similar or comparable to the way SDIs or EO Ground Segments are deploying resources. It would more clearly embrace the distributed computing platform of the Web 2.0, that so many big players have implemented to address reliability, scalability and performance requirements, while coping with the 'data deluge' phenomenon !
Thanks David for sharing this very interesting summary.
Herve
Lance McKee
Tuesday, 2011-11-08
Permalink
(Posted by Lance McKee for Dr
(Posted by Lance McKee for Dr Michael Sanderson)
As an end user I want two things when I come to assemble data for an activity or make a decision. Firstly, I want to be able to access data which is catalogued (so I can work out semantically if these data are relevant) and indexed (ideally in a form that I can link my own data to the data I discover) and secondly, I want to be able to get at these data with a search tool that then enables me to conflate these data with data that I own (they are in my data warehouse). So big data is both structured and unstructured. If its structured in an SDI (and I accept it may come with Digital Rights Management access rules) and I have a set of tools that I use I want to know these will work with the SDI. If the data are unstructured (much of the social media) I want to ideally use the same tools to add sense to and inform the decisions I make. The web as we know it was for documents. Space and time are not documents. These need to be added to the web in a form which which allows space and time to be leveraged if we are to move forward. Things are developing rapidly and at such times chaos rules. So we have RDF/Sparql and I see the emergence of schema.org (from bing, Yahoo and Google) as a parallel move to add structure to unstructured web documents, so there is a need for standards to emerge or I won't be able to traverse any data successfully, Big or otherwise.
Harinath Prabhakaran (not verified)
Wednesday, 2014-09-03
Permalink
David, In just the few
David,
In just the few years since you posted this article, I think we’ve seen progress in the development of standards around RESTful APIs and ways into social media content repositories. But by 2014 I still don’t see the kind of clean global metadata standard that would help integrate data more easily across disciplines. What I see emerging lately is on-premise and cloud platforms like IRI (CoSort) and Splunk, respectively, to discover, prepare or mash-up, and analyze, data in both unstructured and structured data feeds or repositories irrespective of unity. Please post an update on the SDI vs. Big Data debate.