Announcing Delta Lake 3.0: New Universal Format Offers Automatic Compatibility for Apache Iceberg and Apache Hudi
June 28, 2023
New release unifies lakehouse storage formats and reinforces Delta Lake as the best choice for building an open lakehouse
SAN FRANCISCO – June 28, 2023 – Databricks, the Data and AI company, today announced the latest contribution to award-winning Linux Foundation open source project Delta Lake, with the release of Delta Lake 3.0. The upcoming release introduces Universal Format (UniForm), which allows data stored in Delta to be read from as if it were Apache Iceberg or Apache Hudi. UniForm takes the guesswork out of choosing an open data format and eliminates compatibility headaches by offering automatic support for Iceberg and Hudi within Delta Lake. Delta Lake 3.0 will allow users to eliminate the complicated integration work caused by different data formats and focus on building highly-performant, open lakehouses.
“Databricks created the lakehouse architecture, which is built on Delta Lake. We're committed to making Delta Lake the open format that gives customers the most choice and flexibility, greatest control of their own data, and all the benefits of an open ecosystem,” said Ali Ghodsi, Co-Founder and CEO at Databricks. “Customers shouldn’t be limited by their choice of format. With this latest version of Delta Lake, we’re making it possible for users to easily work with whatever file formats they want, including Iceberg and Hudi, while still accessing Delta Lake’s industry-leading speed and scalability.”
Eliminating Data Silos
Enterprises are rapidly adopting the data lakehouse architecture in a shift away from costly, proprietary data warehouses, which offer limited functionality and cannot support advanced use cases like generative AI. Until now, data-driven organizations moving to the lakehouse have had to weigh their options and choose between three different open table formats. With UniForm, customers can move towards interoperability, and benefit from a combined ecosystem of tools that read from Delta, Iceberg and Hudi.
Delta Lake 3.0 will make it possible for businesses everywhere to access the breadth of their corporate data — from transactional to streaming, structured and unstructured, across any kind of format — in a highly performant manner. New functionality includes:
- Delta Universal Format (UniForm): Now, data stored in Delta can be read from as if it were Iceberg or Hudi. With UniForm, Delta automatically generates metadata needed for Iceberg or Hudi, and thus unifies the table formats so users don’t have to choose or do manual conversions between formats. Organizations can confidently bet on Delta as the universal format that will work across ecosystems and can scale to support the changing needs of their business.
- Delta Kernel: To address connector fragmentation, Kernel will ensure connectors are built against a core Delta library that implements Delta specifications, alleviating the need for users to update Delta connectors with each new version or protocol change. With one stable API to code against, developers in the Delta ecosystem are able to seamlessly keep their connectors up-to-date with the latest Delta innovation, without the burden of having to rework connectors. In turn, users can quickly take advantage of the latest Delta features and updates.
- Delta Liquid Clustering: One of the common challenges that companies face in implementing data use cases is related to performance for reads and writes. The introduction of Liquid Clustering is an innovative leap from decades old hive-style table partitioning that uses a fixed data layout. Delta Lake is introducing a flexible data layout technique that will provide cost efficient data clustering as data grows, which will help companies meet their read and write performance requirements.
"Delta Lake 3.0, including Universal Format and Kernel, underlines the open source community’s dedication to enhancing data reliability and delivering advanced analytics. This release is a step forward in creating a community-driven ecosystem of data integrity, seamless collaboration, and real-time analytics tools,” said Mike Dolan, SVP of Projects, The Linux Foundation.
Delta Lake helps organizations leverage data from hundreds of disparate systems to analyze data for insights, reporting and building AI models. With this update, Delta Lake continues to build on its unrivaled performance and user-friendly interface. Delta Lake is the only open format that has built-in support for Delta Sharing, the open standard for secure data exchange, which fosters an open data ecosystem that thrives on collaboration across platforms, clouds and regions. Today, over 6,000 active data consumers are exchanging more than 300PB of data everyday.
“Collaboration and innovation in the financial services industry are fueled by the open source community and projects like Legend, Goldman Sachs’ open source data platform that we maintain in partnership with FINOS,” said Neema Raphael, Chief Data Officer and Head of Data Engineering at Goldman Sachs. “We’ve long believed in the importance of open source to technology’s future and are thrilled to see Databricks continue to invest in Delta Lake. Organizations shouldn’t be limited by their choice of an open table format and Universal Format support in Delta Lake will continue to move the entire community forward.”
Delta Lake is the Most Widely Used Lakehouse Storage Format in the World
With more than 1 billion downloads per year, and regular feature updates from hundreds of contributing engineers across leading businesses like AWS, Adobe, eBay, Twilio and Uber, Delta Lake is the open format of choice for enterprises that want a flexible, high-performance, open data platform that will scale and adapt as their needs evolve.
To learn more about Databricks' commitment to the open source community visit: https://databricks.com/product/open-source.
Databricks continues to expand the Lakehouse Platform, recently announcing Lakehouse Apps and the general availability of Databricks Marketplace, LakehouseIQ, new governance capabilities, and a suite of data-centric AI tools for building and governing LLMs on the lakehouse.
Availability
The Delta Lake 3.0 release is available in preview today as part of Linux Foundation’s Delta Lake project.
To learn more about Databricks’ commitment to the open source community, watch the Data + AI Summit live: https://www.databricks.com/dataaisummit/watch
About Databricks
Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, and over 50% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Delta Lake, Apache Spark™, and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems. To learn more, follow Databricks on Twitter, LinkedIn, and Facebook.
Contact: [email protected]
Safe Harbor Statement
This information is provided to outline Databricks’ general product direction and is for informational purposes only. Customers who purchase Databricks services should make their purchase decisions relying solely upon services, features, and functions that are currently available. Unreleased features or functionality described in forward-looking statements are subject to change at Databricks discretion and may not be delivered as planned or at all.