Creating Reusable Geospatial Pipelines

May 26, 2021 12:05 PM (PT)

Download Slides

Geospatial pipelines in Apache Spark are difficult because of the diversity of datasets and the challenge of harmonizing on a single dataframe. We have worked over the past year to review different pipeline tools that allow us to quickly combine steps to create new workflows or operate on new datasets. We have reviewed Dagster, Apache Spark MLflow pipelines, Prefect, and our own custom solutions. The talk will go over the pros and cons of each of these solutions and will show an actionable workflow implementation that any geospatial analyst can leverage. We will show how we can leverage a pipeline to run a traditional geospatial hotspot analysis. Interactive mapping within the Databricks platform will be demonstrated.

In this session watch:
Dan Corbiani, Data Scientist and Solutions Architect, Pacific Northwest National Lab


Dan Corbiani

Dan Corbiani is a Data Scientist and Solutions Architect who designs, develops, and deploys analytic solutions for research programs. His primary thrust area is the intersection of large-scale geospat...
Read more