This is part 1 of a blog series where we look back at the major areas of progress for Databricks SQL in 2023, and in our first post we are focusing on performance. Performance for a data warehouse is important because it makes for a more responsive user experience and better price/performance, especially in the modern SaaS world where compute time drives cost. We have been working hard to deliver the next set of performance advancements for Databricks SQL while reducing the need for manual tuning through the use of AI.
Modern data warehouses are filled with workload-specific configurations that need to be manually tuned by a knowledgeable administrator on a continuous basis as new data, more users or new use cases come in. These "knobs" range from how data is physically stored to how compute is utilized and scaled. Over the past year, we have been applying AI to remove these performance and administrative knobs in alignment with Databricks' vision for a Data Intelligence Platform:
These innovations have enabled us to make significant advances in performance without increasing complexity for the user or costs.
Databricks SQL has long been a frontrunner in terms of performance and cost efficiency for ETL workloads. Our investment in AI-powered features, such as Predictive IO, helps sustain that leadership position and enhance cost advantages as data volumes continue to grow. This is evident in our processing of ETL workloads where Databricks SQL has up to a 9x cost advantage vs. leading industry competition (see benchmark below).
Databricks SQL now matches leading industry competition on low-latency query performance for smaller numbers of concurrent users (< 100) and has 9x better performance as the number of concurrent users grows to over one thousand (see benchmark below). Serverless compute will also start a warehouse in a few seconds right when needed, creating substantial cost savings that avoids running clusters all the time or performing manual shutdowns. When the workload demand lowers, SQL Serverless automatically downscales clusters or shuts down the warehouse to keep costs low.
Databricks SQL has unified governance, a rich ecosystem of your favorite tools, and open formats and APIs to avoid lock-in -- all part of why the best data warehouse is a lakehouse. If you want to migrate your SQL workloads to a cost-optimized, high-performance, serverless and seamlessly unified modern architecture, Databricks SQL is the solution. Talk to your Databricks representative to get started on a proof-of-concept today and experience the benefits firsthand. Our team is ready to help you evaluate if Databricks SQL is the right choice to help you innovate faster with your data.
To learn more about how we achieve best-in-class performance on Databricks SQL using AI-driven optimizations, watch Reynold Xin's keynote and Databricks SQL Serverless Under the Hood: How We Use ML to Get the Best Price/Performance from the Data+AI Summit.