Announcing the General Availability of Materialized Views and Streaming Tables for Databricks SQL
We’re excited to announce that materialized views (MVs) and streaming tables (STs) are now Generally Available in Databricks SQL on AWS and Azure. Streaming tables offer simple, incremental ingestion from sources like cloud storage and message buses with just a few lines of SQL. Materialized views precompute and incrementally update the results of queries so your dashboards and queries can run significantly faster than before. Together, they allow you to create efficient and scalable data pipelines from ingestion to transformation using just SQL.
In this blog, we’ll dive into how these tools empower analysts and analytics engineers to deliver data and analytics applications more effectively within the DBSQL warehouse. Plus, we’ll cover new capabilities of MVs and STs that enhance monitoring, error troubleshooting, and cost tracking.
Challenges faced by data warehouse users
Data warehouses are the primary location for analytics and internal reporting through business intelligence (BI) applications. SQL analysts must efficiently ingest and transform large data sets, ensure fast query performance for real-time analytics, and manage the balance between quick data access and cost controls. They face several challenges in achieving these goals:
- Slow end-user queries and dashboards: Large BI dashboards process complex views of big datasets, leading to slow queries that hinder interactivity and increase costs due to repeated data reprocessing.
- Improving data freshness while keeping costs down: Precomputing results can reduce query latency but often leads to stale data and high costs, requiring complex incremental processing to maintain fresh data at a reasonable cost.
- Self-service: Traditional SQL pipelines rely on complex manual coding, slowing down responses to business needs.
Materialized views and streaming tables give you fast, fresh data
MVs and STs solve these challenges by combining the ease of views with the speed of precomputed data, thanks to the power of automatic end-to-end incremental processing. This lets engineers deliver fast queries without needing to write complex code, while ensuring the data is as up-to-date as the business requires.
Fast queries and dashboards with MVs
Materialized Views (MVs) enhance the performance of SQL analytics and BI dashboards by pre-computing and storing query results in advance, significantly reducing query latency. Instead of repeatedly querying the base tables, MVs allow dashboards and end-user queries to retrieve pre-aggregated or pre-joined data, making them much faster. Additionally, querying MVs is more cost-effective compared to views, as only the data stored in the MV is accessed, avoiding the overhead of reprocessing the underlying base tables for every query.
Move to real-time use cases while keeping costs low
STs and MVs work together to create fully incremental data pipelines, ideal for real-time use cases. STs continuously ingest and process streaming data, ensuring BI dashboards, machine learning models, and operational systems always have the most up-to-date information. MVs, on the other hand, automatically refresh incrementally as new data arrives, keeping data fresh for users without manual input, while also reducing processing costs by avoiding full view rebuilds. Combining STs and MVs provides the best cost-performance balance for real-time analytics and reporting.
MVs with incremental refresh can also save significant time and money. In our internal benchmarks on a 200 billion-row table, MV refreshes were 98% cheaper and 85% faster than refreshing the whole table, resulting in ~7x better data freshness at 1/50th of the cost of a similar CREATE TABLE AS statement.
Empower your analysts to build data pipelines in DBSQL
Using MVs and STs to develop data pipelines automates much of the manual work involved in managing tables and DML code, freeing analytics engineers to focus on business logic and delivering greater value to the organization with a simple SQL syntax. STs further simplify data ingestion from various sources, like cloud storage and message buses, by eliminating the need for complex configurations.
Utilizing Materialized Views effectively on top of transaction tables has resulted in a drastic improvement in query performance on analytical layer, with the query time decreasing up to 85% on a 500 million fact table. This enables our Business team to consume analytical dashboards more efficiently and make quicker decisions based on the insights gained from the data.— Shiv Nayak / Head of Data and AI Architecture, EasyJet
We've significantly reduced the time needed to handle large volumes using Databricks materialized views. This enhancement has cut our runtime by 85%, enabling our team to work more efficiently and focus on machine learning and business intelligence insights. The simplified process supports more significant data volumes and contributes to overall cost savings and increased project agility.— Sam Adams, Senior Machine Learning Engineer, Paylocity
“The conversion to Materialized Views has resulted in a drastic improvement in query performance… Plus, the added cost savings have really helped.”— Karthik Venkatesan, Security Software Engineering Sr. Manager, Adobe
“We’ve seen query performances improve by 98% with some of our tables that have several terabytes of data.”— Gal Doron, Head of Data, AnyClip
“Utilizing Materialized Views on top of Transaction tables has drastically improved query performance on our analytical layer, with the execution time decreasing up to 85% on a 500 million fact table.”