Delta Lake

Brings data reliability and performance to your data lakes

Delta Lake brings reliability, performance, and lifecycle management to data lakes. No more malformed data ingestion, difficulty deleting data for compliance, or issues modifying data for change data capture. Accelerate the velocity that high quality data can get into your data lake, and the rate that teams can leverage that data, with a secure and scalable cloud service.





Delta Lake is an open source project with the Linux Foundation. Data is stored in the open Apache Parquet format, allowing data to be read by any compatible reader. APIs are open and compatible with Apache Spark™.


Data lakes often have data quality issues, due to a lack of control over ingested data. Delta Lake adds a storage layer to data lakes to manage data quality, ensuring data lakes contain only high quality data for consumers.


Handle changing records and evolving schemas as business requirements change. And go beyond Lambda architecture with truly unified streaming and batch using the same engine, APIs, and code.



ACID Transactions: Multiple data pipelines can read and write data concurrently to a data lake. ACID Transactions ensure data integrity with serializability, the strongest level of isolation. Learn more at Diving into Delta Lake: Unpacking the Transaction Log.
Updates and Deletes: Delta Lake provides DML APIs to merge, update and delete datasets. This allows you to easily comply with GDPR/CCPA and simplify change data capture.
Schema Enforcement: Specify your data lake schema and enforce it, ensuring that the data types are correct and required columns are present, and preventing bad data from causing data corruption.For more information, refer to Diving Into Delta Lake: Schema Enforcement & Evolution.
Time Travel (Data Versioning): Data snapshots enable developers to access and revert to earlier versions of data to audit data changes, rollback bad updates or reproduce experiments. Learn more in Introducing Delta Lake Time Travel for Large Scale Data Lakes.
Scalable Metadata Handling: Delta Lake treats metadata just like data, leveraging Spark’s distributed processing power. This allows for petabyte-scale tables with billions of partitions and files.
Open Format: All data in Delta Lake is stored in Apache Parquet format enabling Delta Lake to leverage the efficient compression and encoding schemes that are native to Parquet.
Unified Batch and Streaming Source and Sink: A table in Delta Lake is both a batch table, as well as a streaming source and sink. Streaming data ingest, batch historic backfill, and interactive queries all just work out of the box.
Schema Evolution: Big data is continuously changing. Delta Lake enables you to make changes to a table schema that can be applied automatically, without the need for cumbersome DDL.
Audit History: The Delta Lake transaction log records details about every change made to data, providing a full history of changes, for compliance, audit, and reproduction.
100% Compatible with Apache Spark API: Developers can use Delta Lake with their existing data pipelines with minimal change as it is fully compatible with Spark, the commonly used big data processing engine.

See our Product News from Azure Databricks and AWS to learn more about our latest features.

Instead of parquet


…simply say delta


Data Ingestion Network

Native connectors to easily ingest data into Delta Lake quickly and reliably from all your applications, databases, and file storage

How It Works

Delta Lake Under the Hood

From Michael Armbrust, Creator of Delta Lake

Delta Lake is an open source storage layer that sits on top of your existing data lake file storage, such AWS S3, Azure Data Lake Storage, or HDFS. It uses versioned Apache Parquet™ files to store your data. Delta Lake also stores a transaction log to keep track of all the commits made to provide expanded capabilities like ACID transactions, data versioning, and audit history. To access the data, you can use the open Spark APIs, any of the different connectors, or a Parquet reader to read the files directly.

Ready to get started?


Follow the Quick Start Guide