Skip to main content

Delta Live Tables

Reliable data pipelines made easy

hero image

Delta Live Tables (DLT) is a declarative ETL framework for the Databricks Data Intelligence Platform that helps data teams simplify streaming and batch ETL cost-effectively. Simply define the transformations to perform on your data and let DLT pipelines automatically manage task orchestration, cluster management, monitoring, data quality and error handling.

DLT flow
data ingestion

Efficient data ingestion

Building production-ready ETL pipelines begins with ingestion. DLT powers easy, efficient ingestion for your entire team — from data engineers and Python developers to data scientists and SQL analysts. With DLT, load data from any data source supported by Apache Spark™ on Databricks. 

“I love Delta Live Tables because it goes beyond the capabilities of Auto Loader to make it even easier to read files. My jaw dropped when we were able to set up a streaming pipeline in 45 minutes.”

— Kahveh Saramout, Senior Data Engineer, Labelbox

data transformation

Intelligent, cost-effective data transformation

From just a few lines of code, DLT determines the most efficient way to build and execute your streaming or batch data pipelines, optimizing for price/performance (nearly 4x Databricks baseline) while minimizing complexity.

“Delta Live Tables has helped our teams save time and effort in managing data at the multitrillion-record scale and continuously improves our AI engineering capability . . . Databricks is disrupting the ETL and data warehouse markets.”

— Dan Jeavons, General Manager Data Science, Shell

simple pipeline

Simple pipeline setup and maintenance

DLT pipelines simplify ETL development by automating away virtually all the inherent operational complexity. With DLT pipelines, engineers can focus on delivering high-quality data rather than operating and maintaining pipelines. DLT automatically handles:

“Complex architectures, such as dynamic schema management and stateful/stateless transformations, were challenging to implement with a classic multicloud data warehouse architecture. Both data scientists and data engineers can now perform such changes using scalable Delta Live Tables with no barriers to entry.”

— Sai Ravuru, Senior Manager of Data Science and Analytics, JetBlue

dlt tco graph

Next-gen stream processing engine

Spark Structured Streaming is the core technology that unlocks streaming DLT pipelines, providing a unified API for batch and stream processing. DLT pipelines leverage the inherent subsecond latency of Spark Structured Streaming, and record-breaking price/performance. Although you can manually build your own performant streaming pipelines with Spark Structured Streaming, DLT pipelines may provide faster time to value, better ongoing development velocity, and lower TCO because of the operational overhead they automatically manage.

“We didn’t have to do anything to get DLT to scale. We give the system more data, and it copes. Out of the box, it’s given us the confidence that it will handle whatever we throw at it.”

— Dr. Chris Inkpen, Global Solutions Architect, Honeywell

Delta Live Tables pipelines vs. “build your own” Spark Structured Streaming pipelines

Spark Structured Streaming pipelines

DLT pipelines

Run on the Databricks Data Intelligence Platform
Powered by Spark Structured Streaming engine
Unity Catalog integration
Orchestrate with Databricks Workflows
Ingest from dozens of sources — from cloud storage to message buses
Dataflow orchestration

Manual

Automated

Data quality checks and assurance

Manual

Automated

Error handling and failure recovery

Manual

Automated

CI/CD and version control

Manual

Automated

Compute autoscaling

Basic

marketecture

Unified data governance and storage

Running DLT pipelines on Databricks means you benefit from the foundational components of the Data Intelligence Platform built on lakehouse architecture — Unity Catalog and Delta Lake. Your raw data is optimized with Delta Lake, the only open source storage framework designed from the ground up for both streaming and batch data. Unity Catalog gives you fine-grained, integrated governance for all your data and AI assets with one consistent model to discover, access and share data across clouds. Unity Catalog also provides native support for Delta Sharing, the industry’s first open protocol for simple and secure data sharing with other organizations.

“We are incredibly excited about the integration of Delta Live Tables with Unity Catalog. This integration will help us streamline and automate data governance for our DLT pipelines, helping us meet our sensitive data and security requirements as we ingest millions of events in real time. This opens up a world of potential and enhancements for our business use cases related to risk modeling and fraud detection.”

— Yue Zhang, Staff Software Engineer, Block

FAQs

DLT pipelines are made of the two fundamental building blocks of Streaming Tables and Materialized Views. They are built on the reliable standards of Delta Tables and Spark Structured Streaming. 

Resources