Databricks’ Data Pipelines: Journey And Lessons Learned

Download Slides

With components like Spark SQL, MLlib, and Streaming, Spark is a unified engine for building data applications. In this talk, we will take a look at how we use Spark on our own Databricks platform throughout our data pipeline for use cases such as ETL, data warehousing, and real time analysis. We will demonstrate how these applications empower engineering and data analytics. We will also share some lessons learned from building our data pipeline around security and operations. This talk will include examples on how to use Structured Streaming (a.k.a Streaming DataFrames) for online analysis, SparkR for offline analysis, and how we connect multiple sources to achieve a Just-In-Time Data Warehouse.