From Days to Seconds — Reducing Query Times on Large Geospatial Datasets by 99%
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Data Engineering and Streaming |
Industry | Energy and Utilities, Public Sector, Financial Services |
Technologies | Apache Spark, Delta Lake, Databricks Workflows |
Skill Level | Intermediate |
Duration | 40 min |
The Global Water Security Center translates environmental science into actionable insights for the U.S. Department of Defense. Prior to incorporating Databricks, responding to these requests required querying approximately five hundred thousand raster files representing over five hundred billion points. By leveraging lakehouse architecture, Databricks Auto Loader, Spark Streaming, Databricks Spatial SQL, H3 geospatial indexing and Databricks Liquid Clustering, we were able to drastically reduce our “time to analysis” from multiple business days to a matter of seconds. Now, our data scientists execute queries on pre-computed tables in Databricks, resulting in a “time to analysis” that is 99% faster, giving our teams more time for deeper analysis of the data. Additionally, we’ve incorporated Databricks Workflows, Databricks Asset Bundles, Git and Git Actions to support CI/CD across workspaces. We completed this work in close partnership with Databricks.
Session Speakers
Chris Crawford
/Sr. Solutions Archtect
Databricks
Hobson Bryan
/Associate Director of Technology
Global Water Security Center