ApacheCon 2018 - Geospatial Track

Monday, 24 September 2018 05:00 EDT
Thursday, 27 September 2018 13:00 EDT



ApacheCon NA 2018 will feature a Geospatial Track as announced in a joint press release from Apache and the Open Geospatial Consortium (OGC). Two years ago, the first dedicated Geospatial Track held at ApacheCon led to the creation of an Apache geospatial list  ( lists.apache.org/list.html?geospatial [at] apache.org ) and helped shape several OGC activities. OGC collaborating with the Apache community is an opportunity to connect the most widely used projects in Open Source with the innovation and standards in OGC.

The Geospatial Track was held on Wednesday, September 26th. Here is the draft:

  1. Speaker: Martin Desruisseaux - Title: Which geospatial API for the cloud?
  2. Speaker: Julian Hyde - Title: Spatial query on vanilla databases
  3. Speaker: Jinchul Kim, Navis - Title: Spatial index optimization using Lucene index and GIS query support
  4. Speakers: Aaron Williams, Ben Lewis - Title: Interacting with Billions of National Water Model (NWM) Predictions using Apache Kafka and MapD
  5. Speaker: Tom Landry - Title: Apache Spark MLib applied to geospatial imagery for flood indication
  6. Speaker: Lewis McGibbney - Title: Introducing Apache SDAP (Incubating): An Integrated Data Analytic Center for Big Science Problems
  7. Speaker: George Percivall - Title: Geospatial data and processing in Apache projects

The last presentation will include time for discussion to coordinate geospatial across the projects.

There will be a Geospatial BOF in the early evening.

Title Name Abstract
Which geospatial API for the cloud? Martin Desruisseaux

The Open Geospatial Consortium (OGC) defines many international standards that make interoperability possible between different geospatial applications. Most standards are articulated around data formats (Well Known Text, Geographic Markup Language, etc.) and web services (Web Map Service, Web Feature Service, etc.) Those standards enable data transfers between server machines, where data are stored, and client machines, where data are typically processed (except with Web Processing Service). But in a world of petabytes of Earth Observation data, bringing data to the algorithm is not always practical; there is sometime a need to bring algorithm to data instead. Google Earth Engine, Open Data Cube, OpenEO and Amazon lambdas are examples of environments where data are hosted and computed remotely. In those environments, the OGC standards for data transfers do not apply as much as in the ''classical'' situation. Consequently each cloud environment defines its own, non-standard API for handling geospatial data.

This presentation will show how an old OGC effort — GeoAPI — could apply to the cloud environment for some kinds of problems. An example of remote execution using the same standard API in both Java and Python languages will be shown. We will present advantage and inconvenient of using a standard API. In particular the perceived complexity of international standards should be weighted against the problems of popular simple alternatives. Apache Spatial Information System (SIS) will be presented as a GeoAPI implementation with a focus on new features, some of them resulting from evolution in standards.

This talk is aimed to peoples having an interest in international standards applied to geospatial data, their implementation in Apache SIS, and how cloud environments may impact those standards. This talk will introduce some advanced features like dynamic datums in spatial referencing, but mainly as illustrations of the expertise contained in international standards.

Spatial query on vanilla databases Julian Hyde

Spatial and GIS applications have traditionally required specialized databases, or at least specialized data structures like r-trees. Unfortunately this means that hybrid applications such as spatial analytics are not well served, and many people are unaware of the power of spatial queries because their favorite database does not support them.

In this talk, we describe how Apache Calcite enables efficient spatial queries using generic data structures such as HBase’s key-sorted tables, using techniques like Hilbert space-filling curves and materialized views. Calcite implements much of the OpenGIS function set and recognizes query patterns that can be rewritten to use particular spatial indexes. Calcite is bringing spatial query to the masses!

Spatial index optimization using Lucene index and GIS query support Jinchul Kim, Navis The importance of handling GIS data in telecommunication industry is ever increasing, especially in OSS(Operation Support System) field. We, SK Telecom which is Korea’s number-one telecommunications provider, also has confronted the same problem. To get current status in time, we have been using Druid for years with many successes but it’s missing of native handling of gis data us keep old legacy systems, which is expensive and hard to expand.

To accommodate this, we have searched various softwares stacks and found Apache Lucene has enough capabilities in both aspects of function and performance. In this session, we will share how we have integrated two technology (Apache Druid and Apache Lucene) handling GIS data, and will provide use cases on them.

Interacting with Billions of National Water Model (NWM) Predictions using Apache Kafka and MapD Aaron Williams, Ben Lewis

The increasing availability of large-scale cloud computing resources has enabled large-scale environmental predictive models such as the National Water Model (NWM) to be run essentially continuously. Such models generate so many predictions that the output alone presents a big data computing challenge to interact with and learn from.

Researchers at the Harvard Center for Geographic Analysis are working with the open-source, GPU-powered database MapD and Apache Kafka to provide true real-time, interactive access to NWM predictions for stream flow and ground saturation across the entire continental US and from present conditions to 18 days in the future. Predictions can be viewed prospectively, “how will conditions change going forward?” as well as retrospectively, “how did condition predictions evolve up to any given present?”. Water conditions can also be tracked in space and time together as storms move across the country.

The speed and flexibility of the GPU analytics platform allows questions such as “how did the stream flow prediction error change over time?'' to be answered quickly with SQL queries, and facilitates joining in additional data such as the location of bridges and other vulnerable infrastructure, all with relatively low-cost computing resources. MapD and other open-source high-performance geospatial computing tools have the potential to greatly broaden access to the full benefits of large-scale environmental models being deployed today.

Apache Spark MLib applied to geospatial imagery for flood indication Tom Landry Apache Spark has risen as one of the most important tool of big data analytics in a distributed computing environment. A relatively recent functionality, Deep Learning Pipeline, facilitates deep learning into the MLlib (Spark’s machine learning library) Pipelines API. We will show how this functionality opens opportunities for processing and analysing a large amount of images, in particular for images time series. As an example, we will demonstrate how a large historical set of satellite images, available through an open data cube, can be featurized through deep neural networks. The created features will be combined with records of flooding events in order to train MLlib’s algorithms to recognize flooding conditions. The resulting models could potentally provide flooding risk indicators.
Geospatial data and processing in Apache projects George Percivall, Ingo Simonis Multiple Apache projects implement geospatial data structures and processing. This presentation provides an overview of geospatial implementations across Apache projects. The presentation includes discussion of how open, consensus standards enable interoperability and interchangeability of open source software components. Highlights of open geospatial standards from the Open Geospatial Consortium (OGC) and other organizations is included.


Apache SIS provides data structures and methods for geographic features including coordinate reference systems, e.g., OGC WKT CRS2. Several projects are providing geospatial structures and methods for Apache Spark, e.g., how to apply Simple Features in Spark. Several projects are addressing spatial indexes on gridded structures including building on Apache Projects and the related OGC DGGS. Apache Science Data Analytics Platform (SDAP) enables fast analysis of oceanographic data. A recent Location Powers workshop highlighted linked-geo-data building on Apache Projects.

OGC standards play an important role for data exchange, which is an area that is increasingly being addressed by the Apache community. For example running Spark on data stored in multiple Cassandra instances is facilitated by using geospatial data models provided by OGC standards. A similar advantage happens when publishing results after analysis.

Coordination of geospatial topics increases data quality and reduces development effort. Coordination based on the use of open consensus standards provides stable and proven APIs and encodings for interoperability and interchangeability of software components. After a session at ApacheCon two years ago, a mailing list was established: geospatial [at] apache.org. The presentation will conclude with an open discussion of geospatial projects across Apache.

For additional information, please contact gpercivall [at] opengeospatial.org