Skip to main content

Materialized Views

What is a materialized view?

A materialized view is a database object that stores the results of a query as a physical table. Unlike regular database views, which are virtual and derive their data from the underlying tables, materialized views contain precomputed data that is incrementally updated on a schedule or on demand. This precomputation of data allows for faster query response times and improved performance in certain scenarios.

Materialized views are especially useful in situations where complex queries or aggregations are performed frequently, and the underlying data changes infrequently. By storing the precomputed results, the database can avoid the need to execute complex queries repeatedly, resulting in faster response times.

Here’s more to explore

Big Book of Data Engineering: 2nd Edition

The latest technical guidance for building real-time data pipelines.

Download now

Big Book of Data Warehousing and BI

Your complete how-to guide to data warehousing with the Data Intelligence Platform — including real-world use cases.

Read now

Learn data engineering now

Watch 4 videos and pass a quiz to earn a badge.

Get started

Is data materialization the same as a materialized view?

While the terms “data materialization” and “materialized views” are related, they do refer to different concepts. So, what does materialized mean in this context?

Data materialization is the concept of storing computed data in a physical medium, such as a hard disk. It’s typically created from a virtual or logical view with one core purpose: to improve performance.

In some respects, data materialization is similar to caching. Both processes involve storing data in a way that makes retrieving it more efficient. The main difference between the two is that caching is used for temporarily storing data close to the point of use, whereas materialized data tends to have a longer lifecycle and a more well-defined update schedule.

With this data materialization definition, we can now explore how it is related to the concept of materialized views.

A materialized view is a type of data materialization used in relational databases to store the results of a query for faster retrieval.

Using materialization in a data warehouse setting is largely about convenience. When data is stored, the choice of how to store it generally follows from how the data is formatted to begin with. But when you want to read the data, that choice might not be conducive to easy retrieval.

For instance, suppose you have a large dataset from which you need to regularly read a subset of data. Retrieving it afresh each time could be relatively time-consuming because you would have to run the query on the full set of data every time, even though you only wanted access to a small portion of it.

In this situation, creating a materialized view would be beneficial. You would set it up to be pre-populated with the data you need to read and define a refresh schedule to happen either automatically or when the system detects a change to the underlying source data.

Comparing database objects: materialized views vs. views

It’s beneficial at this point to define “views” and explain how materialized views are different. Both views and materialized views are database objects that present data in a specific format or as a result of a specific query. However, they differ significantly in terms of storage, performance and use cases (which we will explore in more detail later). 

When comparing a view to a materialized view, a crucial consideration is that a view is a virtual table based on the result set of a SQL query. Importantly, it does not store the data itself; instead, it dynamically retrieves data from the underlying tables whenever the view is queried.

This is fundamentally different from a materialized view. A critical element of how we define “materialized” is in terms of data storage. Specifically, a materialized view is a database object that physically stores the data results it contains from a query.

This means that, unlike a regular view, a materialized view does not retrieve data dynamically. As it stores the query result set, and is refreshed periodically to reflect changes in the underlying tables, it results in improved query performances and more efficient use of processing resources.

Although storage can be considered the main difference between views and materialized views, the table below presents a comparison of how this affects the different features of the two database objects:

FeatureView

Materialized View

Data Storage

Does not store data (virtual table)

Physically stores data (precomputed results)

Data Retrieval

Retrieves data dynamically from base tables

Retrieves data from stored results

Performance

Slower for complex queries

Faster for complex queries

Data Freshness

Always current

Can be stale; requires refresh

Storage Space

No additional storage needed

Requires additional storage space

Use Case

Simplifying complex queries, security

Performance improvement, snapshot data

What are the advantages and disadvantages of materialized views in databases?

Materialized views offer several advantages, but also a number of limitations, in databases, including on the Databricks Platform. Understanding the advantages and disadvantages thoroughly can help users determine when and how to use them effectively.

Materialized view advantages

  1. Improved query performance

    One of the main reasons for creating materialized views is to improve query performance. They achieve this in two essential ways: by speeding up data retrieval and reducing the load on base tables.

    Because they store precomputed results, they eliminate the need to re-compute data and resolve a query (or join in a query) every time. As a result, this significantly speeds up query execution, especially for complex and resource-intensive queries or those accessed with high frequency.

    Storing data in this way also reduces the number of direct queries to base tables. This minimizes the load on these tables and results in an overall improvement in database performance.

     

  2. Efficient use of resources

    You don’t always want to have to run a full query every time you need to access a particular set of data — and it can be particularly slow and inefficient to do so. By precomputing and storing the results, materialized views optimize resource usage, minimizing the need for repetitive calculations and data processing, and saving you time and money.

     

  3. Provides snapshots of data

    Materialized views provide a snapshot of data at a specific point in time. This can be useful if you need to track how your dataset has changed for reporting and analytics purposes, as well as for historical data analysis.

     

  4. Consolidation of data

    If you need to bring together data from a variety of different sources, such as during an ETL process, materialized views are an excellent option. They can be used to aggregate and consolidate data from multiple tables and databases, providing an easy, unified and efficient way to access integrated data.

     

  5. Helps to simplify complex computations

    Complex aggregations, joins and calculations can all be precomputed and stored in materialized views. This is another way in which materialized views reduce high computational costs and make it faster and easier to query and analyze data.

Materialized view limitations

  1. Storage overhead

    One drawback of materialized views is that they require additional storage space to maintain the precomputed data they rely on. Depending on your data storage solution, the size of the view and the frequency of your updates, this additional storage requirement can create a not-insignificant expense.

     

  2. Refresh overhead

    Materialized views offer a static snapshot, so you have to undertake incremental refreshes to ensure you’re seeing the most up-to-date view in line with changes in underlying data. You can do this periodically or assign specific events that trigger how often a materialized view is refreshed. Either way, you must adopt a strategy that outlines how often a refresh is done and allocate logic and resources to compute data to guarantee its freshness.

     

  3. Performance trade-offs

    Unfortunately, the process of refreshing a materialized view can impact the overall performance of your database. This can be particularly noticeable if the refresh is resource-intensive or creates conflicts with other database operations.

     

  4. Limited applicability

    Materialized views are incredibly beneficial if a view is accessed frequently or when read performance is essential, such as in data warehousing. However, they’re not suitable for all types of queries or use cases. For example, tasks that require real-time data updates or systems with a high frequency of changes in underlying data may not work best with materialized views.

When should you use materialized views?

So, with an understanding of the difference between view and materialized view database objects, as well as how there can be advantages to using views, what are the right times to create materialized views instead? Here are some situations where using materialized views could help make the process of accessing data more efficient.

Dealing with regular batch processing

If you need to carry out batch processing regularly, materialized views enable this by precomputing and storing parts of a query to be processed independently. For example, if you manage a weekly payroll, you could use materialized views to store paycheck details, such as salary, tax and commissions for different employees. The stored data in the materialized view would then be refreshed at the end of each week.

Distributing filtered data

Using materialized views can help solve the problem of how to make particular datasets available across multiple locations, as they can be used to replicate and cache data from remote sources. This is particularly useful for read-only databases, since you can copy and distribute data to a number of different storage sites, reducing the overall load on the source database. Anyone requiring access to the data would be able to use the site closest to them, which would also speed up response times.

Optimizing AI/BI Dashboards

AI/BI Dashboards are an effective tool for enhancing data visualization and reporting on the Databricks Data Intelligence Platform. The simplified design is ideal for sharing and distributing visualizations, but it’s important that any data in your public dashboard meets freshness requirements.

Using materialized views, you can schedule periodic updates to ensure the latest view and vastly improve end-user response times by precomputing and storing query results that power the dashboard.

Analyzing time series data

One of the core advantages of materialized views is that they provide snapshots of data. This makes it significantly more straightforward to study how the underlying dataset has changed over time and provides historical data snapshots that can be useful for reporting purposes.

Therefore, materialized views are a suitable solution for a broad range of business intelligence applications. If you need to complete star schema queries or compute aggregates from raw data, materialized views store pre-aggregated summaries, such as monthly averages, weekly sums and daily counts. Visualizing these figures over time is beneficial for historical analysis and reporting purposes.

Preempting disconnection from the main database

If you’re aware that there’s a risk of disconnection from the underlying database, you can use a materialized view to keep the most important data available for working with. Of course, in that situation, you’ll need to be extra careful to make sure your refresh schedule is robust. In this case, you can cache the materialized view locally.

When should you use materialized views?

Bear in mind that there are a few scenarios when using materialized views is not the most appropriate option. For one thing, if your data is quick and easy to query, it’s not really worth the effort. Furthermore, if the source data changes very fast, there’s little point in using materialized views. Instead, it is likely that the hit to the processing overhead from constant data refreshing would outweigh any advantage you could gain from quicker data retrieval.

Create materialized views in Databricks SQL

Materialized views in Databricks SQL are managed via Unity Catalog. They store precomputed results based on the latest data from source tables. Unlike traditional implementations, Databricks materialized views retain the data state at the time of the last refresh, rather than updating every time they are queried. You have the flexibility to manually refresh the materialized views or set up scheduled automatic refreshes.

Databricks SQL materialized views are particularly useful for ETL (extract, transform, load) processing. They provide a simple and declarative approach to handle compliance, corrections, aggregations and change data capture (CDC). Materialized views significantly improve query latency and reduce costs by precomputing slow queries and frequently used computations. Additionally, they enable seamless transformations by cleaning, enriching and denormalizing base tables. In certain cases, materialized views can incrementally compute changes from the base tables, resulting in reduced costs and a streamlined user experience.

Databricks first introduced materialized views as part of the lakehouse architecture, with the launch of Delta Live Tables. Creating a materialized view in a DB SQL warehouse automatically creates a Delta Live Tables pipeline to manage view refreshes. You can easily monitor the status of refresh operations using the Delta Live Tables UI, API or CLI.

How materialized view creation works

The following example creates the materialized view customer_orders from base tables orders and customers:

CREATE MATERIALIZED VIEW customer_orders
AS 
SELECT
  customers.name,
  sum(orders.amount),
  orders.orderdate
FROM orders
  LEFT JOIN customers ON 
    orders.custkey = customers.c_custkey
GROUP BY
  name,
  orderdate;

How materialized view refresh works

In Databricks SQL, you have the option to set up automatic refresh for a materialized view based on a predefined schedule. This schedule can be configured during the creation of the materialized view using the SCHEDULE clause or added later using the ALTER VIEW statement. Once a schedule is established, a Databricks job is automatically created to handle the updates.

To review the schedule details at any time, you can use the DESCRIBE EXTENDED statement, which provides visibility into the configured schedule for the materialized view. This allows you to easily monitor and manage the automatic refresh schedule for your materialized view in Databricks SQL.

For more information on materialized views, refer to the Databricks product documentation or read the intro blog.

    Back to Glossary