Radical Speed for SQL Queries on Databricks: Photon Under the Hood

Join this session to hear from the Photon product and engineering team talk about the latest developments with the project.

As organizations embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly ingest and analyze massive amounts and types of data. With their data lakes, organizations can store all their data assets in cheap cloud object storage. But data lakes alone lack robust data management and governance capabilities. Fortunately, Delta Lake brings ACID transactions to your data lakes – making them more reliable while retaining the open access and low storage cost you are used to.

Using Delta Lake as its foundation, the Databricks Lakehouse platform delivers a simplified and performant experience with first-class support for all your workloads, including SQL, data engineering, data science & machine learning. With a broad set of enhancements in data access and filtering, query optimization and scheduling, as well as query execution, the Lakehouse achieves state-of-the-art performance to meet the increasing demands of data applications. In this session, we will dive into Photon, a key component responsible for efficient query execution.

Photon was first introduced at Spark and AI Summit 2020 and is written from the ground up in C++ to take advantage of modern hardware. It uses the latest techniques in vectorized query processing to capitalize on data- and instruction-level parallelism in CPUs, enhancing performance on real-world data and applications — all natively on your data lake. Photon is fully compatible with the Apache Sparkā„¢ DataFrame and SQL APIs to ensure workloads run seamlessly without code changes. Come join us to learn more about how Photon can radically speed up your queries on Databricks.

About Greg Rahn

For over 20 years, Greg has worked with relational database systems across a variety of roles - including software engineering, database administration, database performance engineering, and product management - providing a holistic view and expertise on the database market. As a member of the product team at Databricks, his focus areas are SQL product strategy and performance. Prior to joining Databricks, Greg was part of the esteemed Real-World Performance Group at Oracle and held SQL product management roles at Cloudera and Snowflake.

Alex Behm
About Alex Behm

Alex has been building databases for over a decade in academia and industry, and maintains a passion for speed and quality. He is the tech lead for Photon, a new vectorized engine written from scratch in C++, that powers Databricks' Delta Engine. Alex holds a PhD in databases from UC Irvine.