Databricks Delta

Bringing unprecedented data reliability and performance to
cloud data lakes for Apache Spark workloads

Try Delta On Azure Preview Delta on AWS

Enables Fast Queries at Massive Scale

Delta automatically indexes, compacts and caches data helping achieve up to 100x improved performance over Apache Spark. Delta delivers performance optimizations by automatically capturing statistics and applying various techniques to data for efficient querying.

Makes Data Reliable for Analytics

Delta provide full ACID-compliant transactions and enforce schema on write, giving data teams controls to ensure data reliability. Deltaʼs upsert capability
provides a simple way to clean data and apply new business logic without reprocessing data.

Simplifies Data Engineering

Delta dramatically simplifies data pipelines by providing a common API to transactionally store large historical and streaming datasets in cloud blob stores and making these massive datasets available for high-performance analytics.

Natively Integrates with the Unified Analytics Platform

Databricks Delta, a key component of Databricks Runtime, enables data scientists to explore and visualize data and combine this data with various ML frameworks (Tensorflow, Keras, Scikit-Learn etc) seamlessly to build models. As a result, Delta can be used to run not only SQL queries but also for Machine Learning using Databricks Workspace on large amounts of streaming data.

At Edmunds, obtaining real-time customer and revenue insights is critical to our business. But we’ve always been challenged with complex ETL processing that slows down our access to data.

Greg Rokita

executive director of technology at Edmunds.com