We are excited to share the latest new features and performance improvements that make Databricks SQL simpler, faster and lower cost than ever. With over 7,000 customers using Databricks SQL as their data warehouse today, this has become the fastest-growing product in our history!
The best data warehouse is a lakehouse
Databricks SQL is built on the lakehouse architecture. We pioneered this approach in early 2020 and launched Databricks SQL (DBSQL) as part of the Databricks Data Intelligence Platform. We predicted that standalone, separate data warehouses would become legacy systems due to their high costs and proprietary nature, and today we see strong evidence this is true: the MIT Technology Insights report shows 74% of enterprises have already adopted the lakehouse architecture. The many lakehouse-based data platforms available for these enterprises were recently reviewed in the Forrester Wave for Data Lakehouses, which recognized Databricks as a Leader with the highest scores in both current offering and strategy categories in comparison to all others!
In our conversations with customers, the lakehouse advantage comes from two things: the lower total cost and one unified platform for AI and BI. The lakehouse makes it possible to use one copy of the data, in an open format, for all your AI and BI workloads. That eliminates the data duplication and replication needed to keep data in sync between multiple platforms, dramatically lowering cost and simplifying the architecture.
AI-powered performance: 4x improvement
Last year, we declared the classic approach to system performance, based on heuristics and cost optimizers, was wrong most of the time! While those techniques were the best available, the current era of AI has enabled a whole new approach. Today, we use a new generation of AI systems at all layers of our platform that have taken system performance improvements to a new level. These AI systems analyze your workloads and improve efficiency and performance automatically.
- Liquid Clustering, now GA, manages the layout of your data, automatically choosing the clustering key and providing the flexibility to redefine clustering keys without data rewrites! This allows your data layout to evolve alongside analytic needs over time and replaces table partitioning and ZORDER so you no longer have to fine-tune your data layout.
- Predictive I/O, also known as "Indexless Indexing", gives you the performance of indexes but without requiring the creation or overhead maintenance of indexes. Thanks to advancements in Mosaic AI systems, we are now able to run models and input feature vectors with an order of magnitude larger parameters without any noticeable increase in prediction latency. This enables predictive I/O to support a much wider set of workloads.
- Intelligent Workload Management uses machine learning models to optimize serverless SQL warehouses resources to best support high-concurrency. This is perfect for BI workloads at scale when large numbers of analysts and queries are hammering the data warehouse. Intelligent Workload Management ensures these workloads have the right amount of resources quickly.
- Predictive Optimization, now GA, automatically handles the typical maintenance operations for tables that help optimize performance. Databricks will identify tables that would benefit from maintenance operations, such as clustering, file size adjustments and file vacuuming, and simply run them for you—no manual tasks required.
These are just some of our built-in AI systems and the best part is you don't need to know the details of how they operate-the magic just happens automatically. Given the amount of time we spend in this area, it's fair to say we are obsessed with performance, and over time we can see what a difference it has made. When we looked at repeating workloads for our customers, performance for the same BI queries has improved by 73% since two years ago! That is 4x faster!
AI Assistant for SQL Analysts
We have also infused AI into our user experience, making Databricks SQL easier to use and more productive for SQL analysts. The Databricks AI Assistant, now generally available, is a built-in, context-aware AI assistant that helps SQL analysts create, edit and debug SQL. This assistant is built on the same data intelligence engine in our platform, so it understands the unique context of your business. The assistant has seen rapid adoption at Databricks because of how well it can draft queries or fix errors for SQL analysts, saving countless hours of time and boosting productivity.
Leverage AI models directly via SQL
With the rise of GenAI and ML models, it's no surprise that SQL analysts want to access those AI models directly within SQL more and more. We first introduced AI functions in Databricks SQL last year for exactly that reason and we have seen rapid adoption ever since. AI Functions are now in public preview and we have added new functions such as vector search as well. AI Functions abstracts away the technical complexities of using LLMs, allowing analysts and data scientists to utilize these models effortlessly, without needing to worry about the underlying infrastructure.
- The ai_query() function allows you to query any AI model from SQL. These can be GenAI models or Classic ML models. You can even use external LLM models
SELECT sku_id, product_name, ai_query( "llama3-8B-instruct", "You are a marketing expert for a winter holiday promotion targeting GenZ. Generate a promotional text in 30 words mentioning a 50% discount for product: " || product_name ) FROM uc_catalog.schema.retail_products WHERE inventory > 2 * forecasted_sales
- Built-in LLM functions
There are also 9 new GenAI functions that allow you to analyze unstructured text with the power of LLMs. For example:
Extract important information from text that is present in a table's column:
Classify a product's review comments based on the content:SELECT ai_extract( 'John Doe lives in New York and works for Acme Corp.', array('person', 'location', 'organization'))
See all 9 functions hereSELECT review_comments, ai_classify(description, ARRAY('clothing', 'shoes', 'accessories', 'furniture')) AS category FROM Products
- Vector Search: The new vector search function lets you perform KNN searches and enables easy out-of-the-box RAG! This uses Databricks' Vector Search product. By combining vector search capabilities and AI_query capabilities SQL analysts can now easily run complex analyses. For example, one can now search all tweets
SELECT Tweet FROM vector_search( index => “main.default.ai_tweets_2024_idx”, query => “retail”, num_results => 10 )
- AI_Forecast: A new time series forecasting built-in function so you can forecast metrics (e.g. revenue) quickly via SQL without needing to build a custom ML model.
SELECT * FROM ai_forecast( TABLE(historical_revenue_table), horizon => '2016-03-31', time_col => 'ds', value_col => 'revenue' )
AI/BI: a new type of business intelligence (BI) product
With the goal of truly democratizing insights from data, we also introduced Databricks AI/BI, a business intelligence product that leverages generative AI to deeply understand data semantics and enable self-service data analysis for everyone in your organization. Built on a compound AI system, AI/BI leverages insights from your entire data estate, including metadata from Unity Catalog, ETL pipelines SQL queries and more. It features two main components: AI/BI Dashboards, a low-code BI offering to quickly create data visualizations and dashboards, and Genie, a conversational interface for your data that continuously learns from user feedback to answer a wide range of real-world business questions without hallucinations. These innovations significantly enhance self-service analytics within Databricks SQL, enabling a broader range of non-technical users while ensuring unified governance, lineage tracking, secure sharing, and high performance through integration with your Data Intelligence Platform.
Complete, end-to-end data warehousing with Databricks SQL
Apart from new AI features, we have also launched a series of core SQL Warehouse capabilities. Thousands of customers have migrated their legacy data warehouses to DBSQL. To make those migrations possible, we made sure DBSQL had all the features to provide the same data warehouse capabilities on the lakehouse:
- Materialized Views: Ensure data freshness by using MVs to power your dashboards. Materialized views automatically update when underlying tables have fresh data instead of when they are queried.
- Use PK/FK constraints to optimize query performance. By using the RELY, queries can be sped up by eliminating redundant joins and distinct aggregations automatically.
- Variant is a new data-type for processing semi-structured data offering a significant performance boost compared to storing data as JSON strings, while still providing the flexibility to support highly nested and evolving schemas.
- Lateral Column Aliases make it easier to write SQL by being able to refer to a reuse an expression specified earlier in the same query. This can help simplify queries by reducing unnecessary CTEs or sub-queries.
- Features like SQL Variables, Named Arguments & Python UDFs are also making it easier to build scripts in Databricks SQL directly.
Don't forget, all of this works in a great AI powered SQL Editor and built-in dashboarding tool.
Plus, thanks to our great partners, we also have a rich, open and integrated ecosystem of your favorite data and AI tools, such as Power BI, Tableau and dbt. It's almost certain that whatever tools you are using today already work with DBSQL.
Learn more and get started with Databricks SQL
To learn more about the latest on data warehousing and Databricks SQL, check out the Data Warehouse keynote from Data + AI Summit along with the many sessions from the Data Warehousing, Analytics and BI track.
If you want to migrate your existing warehouse to a high-performance, serverless data warehouse with a great user experience and lower total cost, then Databricks SQL is the solution -- try it for free.