Session

Reducing Transaction Conflicts in Databricks—Fundamentals and Applications at Asana

Overview

Experience	In Person
Type	Lightning Talk
Track	Data Lakehouse Architecture and Implementation
Industry	Enterprise Technology
Technologies	Apache Spark, Delta Lake
Skill Level	Intermediate
Duration	20 min

When using ACID-guaranteed transactions on Databricks concurrently, we can run into transaction conflicts. The first part of this talk discusses the basics of concurrent transaction functionality in Databricks—what happens when various combinations of INSERT, UPDATE and MERGE INTO happen concurrently. We discuss how table isolation level, partitioning and deletion vectors affect this. The second part of this talk focuses on a particular pipeline evolution at Asana to reduce transaction conflicts. As the number of writers to a table grew, we first implemented writer-specific partitioning to reduce transaction conflicts. Later on, we implemented an intermediate blind append stage to be able to avoid transaction conflicts while leveraging liquid clustering rather than partitioning for improved read and write performance.

Session Speakers

Dima Kamalov

/Software Engineer
Asana