Faster query performance
Built for the fastest performance on real-world applications, Photon provides best-in-class performance for your SQL workloads, directly on your data lake.
No code changes
Designed to be compatible with Apache Spark APIs, Photon will work with your existing code — no rewrite required.
Broad language support
Photon currently supports SQL workloads but will ultimately accelerate all your data use cases — from streaming to batch workloads — using SQL, Python, R, Scala and Java.
Query performance on Databricks has steadily increased over the years, powered by Apache Spark and thousands of optimizations packaged as part of the Databricks Runtimes (DBR). Photon — a new native vectorized engine entirely written in C++ — provides an additional 2x speedup per the TPC-DS 1TB benchmark, and customers have observed 2x–4x speedups on average based on their workloads compared to the latest DBR versions.
Accelerate large-scale production jobs on SQL and Spark DataFrames.
Faster time-series analysis using Photon compared to Spark and traditional Databricks Runtime.
Data privacy and compliance
Query petabytes-scale datasets to identify and delete records without duplicating data with Delta Lake,production jobs and Photon.
Loading data into Delta Lake and Parquet
Photon’s vectorized I/O speeds up data loads for Delta Lake and Parquet tables, lowering overall runtime and costs of data engineering jobs.
Works with your existing code and avoids vendor lock-in
Photon is designed to be compatible with the Apache Spark DataFrame and SQL APIs to ensure workloads run seamlessly without code changes. All you have to do to benefit from Photon is turn it on. Photon will seamlessly coordinate work and resources and transparently accelerate portions of your SQL and Spark queries. No tuning or user intervention required.
Optimizing for all data use cases and workloads
While the new engine is designed to accelerate all workloads, during preview, Photon is focused on running SQL workloads faster, while reducing your total cost per workload. Ultimately, Photon will support all data and machine learning use cases as well.