The best Data Where-house is on Databricks with full geospatial support
by Kent Marten
A hurricane is forming in the Florida Gulf. As an insurer, you need to answer key questions for the business immediately: identify the policies inside the projected storm paths, the total insured value at risk, the worst-exposed counties, and which reinsurance partners need to be notified.
Not long ago, answering these spatial questions meant stitching together multiple systems: a spatial database for the intersections, a warehouse for the policy data, and a visualization tool mapping results to share with analysts and underwriters. You might even have replicated the policy data within an external system. Every extra system adds risk, and every copy of data fragments governance.
Today, spatial work can happen on one platform. Spatial SQL is now Generally Available. Databricks is a geospatial lakehouse. The era of bolting a spatial database onto a warehouse onto a mapping tool is over. Store data as Geometry in Iceberg or Delta, run spatial queries at scale, call 90+ spatial functions, share through Delta Sharing, and explore in Genie, while Unity Catalog handles the governance.
Databricks customers love the value the platform delivers:
Spatial SQL allows us to simplify ETL workloads, ensure performant queries, and collapse complex geospatial architectures using completely open data types with Delta Lake. We saw 70% faster queries while unlocking analytical capabilities that weren't possible before. S&P Global Energy empowers customers with a comprehensive view of global energy and commodities markets that creates long-term sustainable value. — Hubert Boguski, Software Engineer II, S&P Global Energy
Within the time-crunch caused by an approaching hurricane, every second counts. This is why we have continuously improved out-of-the-box performance of spatial joins and ST_ functions since Public Preview. To measure the latest improvements, we ran a comprehensive benchmark using SpatialBench. Across SpatialBench, 8 of the 12 queries improved since Public Preview, with gains ranging from 20% to 15X.
For boolean set operations (ST_Intersection, ST_Difference, ST_Union) we’ve introduced improved algorithms. These functions can help answer questions like, “Which parts of my land parcels lie inside the projected hurricane path?” and “What's the combined coverage of all our cell towers in this area?” Databricks is now 2X faster on average working with areal datasets using these operators compared to the prior versions. No code changes required, your existing queries just got faster.
These are the spatial operations that drive efficiency for Databricks customers like Top Chrono, who specialize in Premium Courier and Last-Mile Delivery services.
Databricks Spatial SQL replaced our reliance on third-party libraries that were troublesome to maintain and required SQL UDFs for basic operations. Today we use ST_Transform to project trips into Lambert 93 (France) for precise distances, ST_Within to detect deliveries entering customer zones, ST_Union to merge overlapping driver routes, and more. Databricks provides the complete high-performance spatial toolkit that scales with our delivery operation. — Maxime Delobelle, Lead Data Architect, Top Chrono
For spatial questions, often the best way to share results is through maps. As part of the Spatial SQL GA, AI/BI now render maps using Geometry or Geography columns. No more custom applications or third-party mapping tools to visualize your geo data.
When the underwriter opens the hurricane-exposure dashboard, the at-risk policies, the hurricane path, and historical tracks can all be part of the visual. You can filter by county, compare different forecasted paths, or slice the data as you see fit.
And the underwriter doesn't have to write SQL to get there. Genie Code can generate the right dashboard with a single prompt.
Genie reasons over geospatial columns the same way it reasons over any other column. You can type "Show me policies in the Florida counties in the hurricane forecast, where total insured value is over $1M," and Genie generates the spatial query, respects Unity Catalog row filters, and can produce a dashboard with maps as needed.
Risk and exposure data needs to be shareable. Reinsurance partners need the policy-level cession files. Emergency management agencies need to share data internally and externally. Every one of those exchanges could be a custom data-extract pipeline.
Now with Spatial SQL GA, tables with geo columns are supported by Delta Sharing. The insurer publishes a single Delta Share that contains the policy boundary, the underwriter's reinsurance partner reads from it directly, no data extraction or schema translation. Access is governed by Unity Catalog policies and lineage is tracked.
Databricks openness for geo now extends to the underlying table format. Using Spatial SQL, you can now read and write to managed Iceberg tables, and read from Iceberg tables written externally. Iceberg v3 support on Databricks is already GA, now extended to support geospatial data types. The open lakehouse means standards over silos.
What's GA today
Spatial SQL on Databricks includes:
Note: Geography will remain in Public Preview until it is fully supported across common spatial functions.
The Databricks Platform now supports working with Geospatial data types in:
This blog describes a scenario for an insurance company, but geospatial context is important across all domains:
The open lakehouse story doesn't stop at the Databricks platform. Databricks is contributing GEOMETRY and GEOGRAPHY types to Apache Spark 4.2 (scheduled for summer 2026). The same geometry and geography you're querying on Databricks today will be the same first-class types available to every Spark community user.
Provide your feedback to the Product team
If you would like to share your requests for additional map visualization requirements, ST expressions, or any geospatial features, please fill out this short feedback survey.
Subscribe to our blog and get the latest posts delivered to your inbox.