Patterns and Operational Insights from the First Users of Delta Lake

Download Slides

Cyber threat detection and response requires demanding work loads over large volumes of log and telemetry data. A few years ago I came to Apple after building such a system at another FAANG company, and my boss asked me to do it again. I learned a lot from my prior experience using Apache Spark and AWS S3 at massive scale some good patterns, but also some bad patterns and pieces of technology that I wanted to avoid. That year I ran into Michael Armbrust at Spark+AI Summit and described what I wanted to do and a plan to test Databricks as a foundation for the new system. A few months later, while we were in the middle of our proof of concept build out on Databricks, Michael gave me some code they were calling Tahoe. It was the early alpha of what became Delta Lake, and it was exactly what we wanted. We have been running our entire system writing out hundreds of TB of data a day on Delta Lake since the very beginning.

This presentation will cover some of the issues we encountered and things we have learned about operating very large workloads on Databricks and Delta Lake.

  • Effective Delta Lake patterns for streaming ETL, data enrichments, analytic workloads, large dataset queries, and Large Materialized Aggregates for fast answers
  • Z-ordering and the 32 column default limit. Oops. Optimizing your schema to ensure z-ordering is effective
  • Date partitioning and the implications of event times with long-tail distributions or from unsynchronized clocks
  • Optimize, optimize, optimize, and when autoptimize is your only option
  • Upsert patterns that have simplified important jobs
  • Tuning Delta Lake for very large tables and low-latency access

 
Try Databricks
« back
About Dominique Brezinski

Apple

Dominique Brezinski is a member of Apple's Information Security leadership team and principal engineer working with the Threat Response org. He has twenty five years experience in security engineering, with a focus on intrusion detection and incident response systems design and development. Dom has been working with Apache Spark in production since the 0.8 release.