Towards Multi-Table Transactions in Delta Lake
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Lightning Talk |
TRACK | Data Lakehouse Architecture |
INDUSTRY | Enterprise Technology, Health and Life Sciences, Travel and Hospitality, Financial Services |
TECHNOLOGIES | Delta Lake, Governance |
SKILL LEVEL | Intermediate |
DURATION | 20 min |
DOWNLOAD SESSION SLIDES |
This talk discusses Coordinated Commits, a new commit protocol for Delta Lake that changes the source of commit atomicity from the object store to an external commit coordinator (e.g., HMS/Unity Catalog/Glue) that will help us provide flexibility in how transactions are performed, laying out the foundation for advanced features such as multi-statement transactions. Delta was originally built on the premise that cloud storage is the source of truth. However, cloud storage has limited primitives for atomicity; more specifically, object stores lack the means to perform atomic commits for more than a single write/statement. In this talk, we talk about the new commit protocol, Coordinated Commits, that aims to solve the following:
- Support multi-table-multi-statement transactions.
- Provide reliable commit semantics even when the underlying object store lacks put-if-absent semantics (e.g., S3).
- Data governance overwrite operations.
SESSION SPEAKERS
Prakhar Jain
/Staff Software Engineer
Databricks