Session

Reducing Transaction Conflicts in Databricks—Fundamentals and Applications at Asana

Overview

ExperienceIn Person
TypeLightning Talk
TrackData Lakehouse Architecture and Implementation
IndustryEnterprise Technology
TechnologiesApache Spark, Delta Lake
Skill LevelIntermediate
Duration20 min

When using ACID-guaranteed transactions on Databricks concurrently, we can run into transaction conflicts. The first part of this talk discusses the basics of concurrent transaction functionality in Databricks—what happens when various combinations of INSERT, UPDATE and MERGE INTO happen concurrently. We discuss how table isolation level, partitioning and deletion vectors affect this. The second part of this talk focuses on a particular pipeline evolution at Asana to reduce transaction conflicts. As the number of writers to a table grew, we first implemented writer-specific partitioning to reduce transaction conflicts. Later on, we implemented an intermediate blind append stage to be able to avoid transaction conflicts while leveraging liquid clustering rather than partitioning for improved read and write performance.

Session Speakers

Dima Kamalov

/Software Engineer
Asana