LAKEFLOW SPARK DECLARATIVE PIPELINES

Reliable data pipelines made easy

Simplify batch and streaming ETL with automated reliability and built-in data quality.

Take the Product Tour Explore the Documentation

TOP TEAMS SUCCEED WITH INTELLIGENT DATA PIPELINES

Learn how to build ETL pipelines with SQL

Build batch and real-time ETL pipelines using SQL. No data engineering support needed.

Read now

BENEFITS

Data pipeline best practices, codified

Simply declare the data transformations you need — let Spark Declarative Pipelines handle the rest.

Efficient ingestion

Building production-ready data pipelines starts with ingestion. Spark Declarative Pipelines enables efficient ingestion for data engineers, Python developers, data scientists and SQL analysts. Load data from any Apache Spark™-supported source on Databricks, whether batch, streaming or CDC.

Intelligent transformation

From just a few lines of code, Spark Declarative Pipelines determines the most efficient way to build and execute your batch or streaming data pipelines, automatically optimizing for cost or performance while minimizing complexity.

Automated operations

Spark Declarative Pipelines simplifies pipeline development by codifying best practices out of the box, automating dependency management, scaling and recovery, data quality rules and more. With Spark Declarative Pipelines, engineers can focus on delivering high-quality data rather than operating and maintaining pipeline infrastructure.

FEATURES

Built to simplify data pipelining

Building and operating data pipelines can be hard — but it doesn’t have to be. Spark Declarative Pipelines is built for powerful simplicity, so you can perform robust ETL with just a few lines of code.

Leveraging Spark’s unified API for batch and stream processing, Spark Declarative Pipelines allows you to easily toggle between processing modes.

Learn more

Spark Declarative Pipelines makes it easy to optimize pipeline performance by declaring an entire incremental data pipeline with streaming tables and materialized views.

Learn more

Spark Declarative Pipelines supports a broad ecosystem of sources and sinks. Load data from any source — including cloud storage, message buses, change data feeds, databases and enterprise apps.

Learn more

Expectations allow you to guarantee data arriving in tables meets data quality requirements and provides insights on data quality with each pipeline update.

Learn more

Develop pipelines in the IDE for Data Engineering without any context switching. See the DAG, data preview and execution insights in one UI. Develop code easily with autocomplete, in-line errors and diagnostics.

Learn more

More features

Unified governance and storage

Built on the foundational lakehouse standards of Unity Catalog and open table formats.

Learn more

Serverless compute

Up to 5x better price/performance for data ingestion and 98% cost savings for complex transformations.

Learn more

Task orchestration

Instead of manually defining a series of separate Apache Spark™ tasks, you define the transformations, and Spark Declarative Pipelines ensures they are executed in the correct sequence.

Learn more

Error handling and failure recovery

Seamless recovery from errors that occur during the execution of data pipelines.

Learn more

CI/CD and version control

Easily specify configurations to isolate pipelines in developing, testing and production environments.

Learn more

Pipeline monitoring and observability

Built-in monitoring and observability features, including data lineage, update history and data quality reporting.

Learn more

Flexible refresh scheduling

Easily optimize for latency or cost depending on your pipeline’s requirements.

Learn more

USE CASES

Streamline your data pipelines

Easily ensure data integrity and consistency

Simplify change data capture with the APPLY CHANGES APIs for change data feeds and database snapshots. Spark Declarative Pipelines automatically handles out-of-sequence records for SCD Type 1 and 2, simplifying the hardest parts of CDC.

Get started

Unlock powerful real-time use cases without extra tooling

Build and run your batch and streaming data pipelines in one place with controllable and automated refresh settings, saving time and reducing operational complexity. Operationalize streaming data to immediately improve the accuracy and actionability of your analytics and AI.

Get started

Seamlessly bring data engineering best practices to the world of data warehousing

With Spark Declarative Pipelines, data warehouse users have the full power of declarative ETL via an accessible SQL interface. Empower your SQL analysts with low-code, infrastructure-free data pipelines, unlocking fresh data for the business with minimal setup or dependencies.

Get started

Explore Spark Declarative Pipelines demos

See all demos

PRODUCT TOUR

Spark Declarative Pipelines Product Tour

demo center lakeflow declarative pipelines

VIDEO

Spark Declarative Pipelines Real-Time Avionics Demo

VIDEO

Data Engineering with Lakeflow

VIDEO

Building a Data Application with Lakeflow

PRICING

Usage-based pricing keeps spending in check

Only pay for the products you use at per-second granularity.

Explore pricing

Discover more

Explore other integrated, intelligent offerings on the Data Intelligence Platform.

Lakeflow Connect

Efficient data ingestion connectors from any source and native integration with the Data Intelligence Platform unlock easy access to analytics and AI, with unified governance.

Lakeflow Jobs

Easily define, manage and monitor multitask workflows for ETL, analytics and machine learning pipelines. With a wide range of supported task types, deep observability capabilities and high reliability, your data teams are empowered to better automate and orchestrate any pipeline and become more productive.

Lakehouse Storage

Unify the data in your lakehouse, across all formats and types, for all your analytics and AI workloads.

Unity Catalog

Seamlessly govern all your data assets with the industry’s only unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform.

The Data Intelligence Platform

Find out how the Databricks Data Intelligence Platform enables your data and AI workloads.

Take the next step

Explore the Spark Declarative Pipelines docs

Everything you need to get started using Spark Declarative Pipelines on the AWS, Microsoft Azure or Google Cloud Platform environments.

Start a free trial

Test-drive the full Databricks Platform for free.

Spark Declarative Pipelines FAQ

Ready to become a data + AI company?

Take the first steps in your transformation

Try for free Contact Sales

Reliable data pipelines made easy

Data pipeline best practices, codified

Efficient ingestion

Intelligent transformation

Automated operations

Built to simplify data pipelining

More features

Unified governance and storage

Serverless compute

Task orchestration

Error handling and failure recovery

CI/CD and version control

Pipeline monitoring and observability

Flexible refresh scheduling

Streamline your data pipelines

Make sources, transformations and destinations simple

Easily ensure data integrity and consistency

Unlock powerful real-time use cases without extra tooling

Seamlessly bring data engineering best practices to the world of data warehousing

Explore Spark Declarative Pipelines demos

Usage-based pricing keeps spending in check

Discover more

Lakeflow Connect

Lakeflow Jobs

Lakehouse Storage

Unity Catalog

The Data Intelligence Platform

Take the next step

Explore the Spark Declarative Pipelines docs

Start a free trial

Related content

Spark Declarative Pipelines FAQ