Reducing Transaction Conflicts in Databricks—Fundamentals and Applications at Asana
Overview
Experience | In Person |
---|---|
Type | Lightning Talk |
Track | Data Lakehouse Architecture and Implementation |
Industry | Enterprise Technology |
Technologies | Apache Spark, Delta Lake |
Skill Level | Intermediate |
Duration | 20 min |
When using ACID-guaranteed transactions on Databricks concurrently, we can run into transaction conflicts. The first part of this talk discusses the basics of concurrent transaction functionality in Databricks—what happens when various combinations of INSERT, UPDATE and MERGE INTO happen concurrently. We discuss how table isolation level, partitioning and deletion vectors affect this. The second part of this talk focuses on a particular pipeline evolution at Asana to reduce transaction conflicts. As the number of writers to a table grew, we first implemented writer-specific partitioning to reduce transaction conflicts. Later on, we implemented an intermediate blind append stage to be able to avoid transaction conflicts while leveraging liquid clustering rather than partitioning for improved read and write performance.
Session Speakers
Dima Kamalov
/Software Engineer
Asana