ApacheCon 2019 - Geospatial Track

Monday, 9 September 2019 05:00 EDT
Monday, 9 September 2019 17:00 EDT

ApacheCon NA 2019 will feature a Geospatial Track as announced in a joint press release from Apache and the Open Geospatial Consortium (OGC).
A Geospatial Track held at ApacheCon for 4 years and has led to the creation of an Apache geospatial list (apache.org geospatial maillist) and helped shape several OGC activities. OGC collaborating with the Apache community is an opportunity to connect the most widely used projects in Open Source with the innovation and standards in OGC.

The Geospatial Track agenda:

  1. Geospatial Data and Processing - Reusable Building Blocks - George Percivall, OGC
  2. Geospatial Data Management in Apache Spark - Jia Yu, ASU & Mohamed Sarwat, ASU
  3. Apache Science Data Analytics Platform Apache (SDAP) - Thomas Huang, JPL
  4. Using GeoMesa on top of Accumulo, HBase, Cassandra, and big data file formats for massive geospatial data - a LocationTech ProjectJames Hughes, CCRI & Eddie Pickle, Radiant Solutions
  5. Geospatial Indexing and Search at Scale with Apache Lucene - Nick Knize, Elastic
  6. GeoSpatial and Temporal Forecasting in Uber Marketplace - Chong Sun, Uber and Brian Tang, Uber
  7. Realtime Geospatial Analytics with GPUs, RAPIDS, and Apache ArrowJosh Patterson, NVIDIA

There will be a Geospatial BOF following the last presentation.
The ApacheCon NA 2019 website for more information, schedule, registration, etc.

Title Speaker Abstract
Geospatial Data and Processing - Reusable Building Blocks George Percivall, Open Geospatial Consortium

Reuse of common elements for geospatial information and processing results in increased productivity, lower interoperability friction, and higher data quality. This presentation provides a survey of reusable geospatial building blocks.

Common practices for coordinate reference systems (CRSs), spatial geometries and data arrays used for projects with geospatial content will be described based on open source projects and open standards. Emphasis is placed on the use of open standards including the recently updated OGC CRS Well Known Text (CRS WKT) and OGC APIs.

The presentation will provide the latest update on OGC API development. OGC APIs are being defined for geospatial resources, e.g., maps, features, coverages. Developed using OpenAPI, the APIS can be implemented in a number of languages and patterns. The presentation will be describe the state of implementations and plans for standardization. The modular structure enables flexibility for developers to reuse OGC APIs in their APIs.

This presentation will allow time for discussion of coordination across Apache projects with geospatial content. Geospatial tracks at previous ApacheCons concluded with an open discussion leading to the creation of geospatial@apache.org.

Geospatial Data Management in Apache Spark Jia Yu and Mohamed Sarwat, Arizona State University

The volume of spatial data increases at a staggering rate. This talk comprehensively studies how existing works, such as GeoSpark, extend Apache Spark to uphold massive-scale spatial data. During this talk, we first provide a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third and fourth sections then discuss the ongoing efforts and experience in spatial-temporal data and spatial data analytics, respectively. The fifth part finally concludes this talk to help the audience better grasp the overall content and points out future research directions.

Apache Science Data Analytics Platform Apache (SDAP) Thomas Huang, Jet Propulsion Laboratory, California Institute of Technology

An Analytics Center Framework (ACF) is an environment that enables the confluence of resources for scientific investigation. It harmonizes data, tools and computational resources which subsequently enable the research community to focus on the investigation. The Earth science community is an innovative community. We produce many tools and solutions to improve how we do science. In computer science, a framework is a reusable, semi-complete application that can be specialized to produce custom applications [Johnson:88]. After more than two years of actively developing an open source ACF, on October 2017, the NASA AIST OceanWorks project established collaboration with the Apache Software Foundation, called the Apache Science Data Analytics Platform (SDAP). It is a big data analytics platform designed for cloud-based data management, analytics, match-up, and data discovery services. It is a community-support, extensible open source GIS platform. The motivation is to empower the Earth and Space Science Informatics community to develop a common big data solution for the cloud and on-premise cluster. The big data analytics platform is being used to support NASA Sea Level research, GRACE and GRACE Follow-On mission sciences, and NASA Physical Oceanography, etc. This talk describes the Apache SDAP and lesson learned from developing and moving SDAP in production to support various NASA and JPL researches.

GeoMesa on top of Accumulo, HBase, Cassandra, and big data file formats for massive geospatial data - a LocationTech Project James Hughes, CCRI and Eddie Pickle, Radiant Solutions LocationTech is the geospatial software working group of the Eclipse Foundation. The projects range from fundamental libraries that provide spatial operations to complex library suites which coordinate multiple Apache projects to build complete spatial processing solutions.
LocationTech GeoMesa builds on top of distributed Apache databases like Accumulo, HBase, Cassandra and Kafka to provide indexing, querying, and analysis for large spatio-temporal datasets. GeoMesa does this by integrating other LocationTech projects like JTS, Spatial4J, and SFCurve with these databases as well as Apache open source file formats such as Avro, Arrow, Orc, and Parquet.
In this talk, we will give an overview of the geospatial capabilities that the foundational LocationTech libraries can bring to a project. With that background, we will discuss how GeoMesa integrates those capabilities into distributed databases and the file formats. We will wrap up with a quick look at the other big geo-data projects in LocationTech (GeoTrellis, GeoWave, and RasterFrames).
Geospatial Indexing and Search at Scale with Apache Lucene Nick Knize, Elastic

Come have a look under the covers at new data structures that enable geospatial and multi-dimensional indexing and search at massive scale in Apache Lucene. This talk will cover the indexing structures considered and ultimately implemented in the Apache Lucene Open Source Project along with the 25 - 30X boost in performance and centimeter spatial accuracy achieved in the latest release. Join us and see what's next for scalable Geospatial Search in Apache Lucene.

GeoSpatial and Temporal Forecasting in Uber Marketplace

Chong Sun, Uber and Brian Tang, Uber

Uber's Marketplace is the algorithmic brain and decision engine behind our ride-sharing services. Marketplace Forecasting builds and deploys ML algorithms to handle the immense coordination, hyperlocal decision making, and learning needed to tackle the enormous scale and movement of our transportation network.

In order for our decision engines to be future-aware, we need to be able to see into the future as accurately as possible across both space and time. In Uber, we use H3 (a hexagonal hierarchical geospatial indexing system) to partition the data geospatially. To incorporate both geospatial and temporal features into the forecasting models in real time, we need efficient and scalable techniques for data processing.

In this presentation, we provide an overview of geospatial and temporal forecasting problem in Uber Marketplace Forecasting. Then, we use real time forecasting as an example to show the need for efficient data smoothing and describe a hexagon convolution based approach for processing H3 data in production. We will demonstrate how we could incorporate hexagon convolution with Apache Flink to scale the data smooth for forecasting.

Realtime Geospatial Analytics with GPUs, RAPIDS, and Apache Arrow Josh Patterson, NVIDIA

Prior to RAPIDS, geospatial analytics, especially networking analytics for routing, required large CPU clusters to process. Even with 100s of machines, it would take hours if not days to get answers. With RAPIDS, and Apache 2.0 open source project built on Apache Arrow, graph analytics, clustering, and many other geospatial workflows can be completed end to end in seconds. Learn how to load data (CSV, Parquet, or ORC) directly into GPU memory with cuIO, process it with Dask-cuDF, and analyze it with cuML and cuGraph in seconds, all on a single node. Finally, learn how RAPIDS is scaling to multiple GPU nodes to solve the largest of geospatial challenges.