Big Data and Clouds: Research Presentations at IGARSS

As part of its Innovation Program, OGC invited international experts to two sessions at this year’s IGARSS conference. IGARSS, the International Geoscience and Remote Sensing Symposium, took place in sunny hot Valencia, Spain, the week of July 23rd.


IGARSS 2018 main building


Both sessions addressed Big Data in distributed clouds aspects. The first session focused on data integration, processing, and visualization challenges, whereas the second session put these aspects into context and provided actual examples. Both sessions were very well attended, with people packed to the very last seat.


In summary, it could be said that impressive progress has been made over the last few years in the context of Big Earth Observation Data processing in distributed clouds, but many aspects, in particular in the context of semantic interoperability, machine learning, and rapid data exploration will keep the research and development teams busy for the upcoming years.


Konstantina Bereta with the National and Kapodistrian University of Athens introduced ontology based access and visualization of big vector and raster data and described potential migration and co-existing paths for data stored in relational databases and their on-the-fly mapping to RDF. It became clear that vector data can be handled well, but coverages are still a challenge, as appropriate query languages are still under development.


Tomáš Řezník, professor at Masaryk University, demonstrated how big data can serve the agricultural domain. In his case based on OGC Observation and Measurement, Tomáš showed efficient yield production model, sensor data, and machinery fleet monitoring data integration and visualization based on standardized data models.


Also for the agriculture domain, Christian Zinke with InfAI e.V. outlined ways to achieve semantic interoperability based on Linked Data principles. He emphasized the issues that current linked data frameworks and tools have with Big Data, and introduced his tool Limes, which addresses a number of these issues. Limes can build links between features by analyzing their temporal and topological relationships. How to handle ‘link overload’ is still an unresolved challenge, though.


Myself, I presented the ongoing work we do in OGC on the architecture standardization efforts for geospatial Big Data processing in hybrid cloud environments. I outlined the latest results from the Innovation Program and addressed, in particular, the semantic aspects that I discussed the other day with Mihai Datcu from the German Aerospace Center (DLR). We figured that with all the Machine Learning and Deep Learning solutions presented at IGARSS (both topics have been the subject of 17 sessions!), we wondered how a user can actually apply containerized applications that are built with specific training, datasets, ground truths, and semantics in a distributed environment. An aspect that we will follow up on in the future.


The first session was completed by Chris Lynnes' presentation on generalizing a data analysis pipeline in the cloud to handle diverse use cases in NASA’s Earth Observing System Data and Information System (EOSDIS). From the data provider perspective, he presented NASA’s plan to encapsulate every step of the processing chain as an individual service. This allows users to inject external resources into the chain more easily. Research questions remain concerning: the optimization of each of these services; efficient in-cloud analytics; and support for Machine Learning approaches. The question of the optimal data store remained unanswered, as it depends too much on the actual requirements and usage scenarios, but the plan is to develop a set of guidelines for specific situations.



The second session brought - in addition to further technology overviews such as Peter Baumann’s Datacube survey - big data and distributed cloud processing into various contexts, such as: the Canadian climate and earth observation analytics efforts presented by Tom Landry; the request to map Analysis Ready Data to United Nations Sustainable Development Goal targets and indicators by Alex Held; water body classifications as presented by Otto Wagner; and the Colombian Data Cube by Harold Castro.


Overall, we had two highly successful sessions that helped calibrate the current, as well as define future, research targets. IGARSS itself was dominated by Machine and Deep Learning presentations and promised a set of micro-satellite based SAR constellations in the very near future!


OGC's engagement in IGARSS has been supported by the DataBio project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064.