In this blog post, we will share how you can use Databricks SQL Materialized Views with AI/BI Dashboards to deliver fresh data and insights to your business.
We recently announced the public preview of AI/BI Dashboards on the Databricks Databricks Data Intelligence Platform. AI/BI Dashboards represent a significant step forward in creating visualizations and reporting experiences for Databricks customers. They provide significant improvements in visualization and a simplified design experience that is optimized for sharing and distribution.
AI/BI Dashboards run on the Databricks SQL (DBSQL) data warehouse. DBSQL lets you run all of your SQL and BI applications at scale with your tools of choice at a fraction of the cost of legacy cloud data warehouses.
Materialized Views help deliver fresh data to the business
Materialized views (MVs) are like regular views but are pre-computed and automatically kept up-to-date as the sources change. End-user queries and dashboards are fast because they are querying data that is precomputed instead of directly querying potentially massive data volumes in the source tables. We saw an improvement in dashboard performance in our internal use case that involved the reporting of AI/BI's usage data and event logs. The dashboard response time reduced significantly from over thirty seconds to just 1-2 seconds per query. Query speed improvements are heavily dependent on the MV query definition so users are encouraged to experiment on their own data.
Let's take a closer look at how they work and their benefits.
- First, MVs can quickly process complex transformations in a declarative way - they automatically handle updates/deletes and any changes to the source and can be defined for any query, making them ideal for ETL.
- Data flows faster and users get more up-to-date results because MVs incrementally refresh updates to their sources, which avoids the need to rebuild the view when new data arrives, lowering overall compute time.
- End-user queries and dashboards are faster because data is pre-computed - it is much faster to query pre-computed data vs. querying the massive data volumes in the base tables.
MVs can also be used in conjunction with Streaming Tables for incrementally ingesting data into the Databricks SQL data warehouse, empowering your users to build end-to-end ETL pipelines in a streaming fashion.
Step-by-step Guide - Using MVs to speed up AI/BI Dashboards
In the following guide, we will provide step-by-step instructions on how to leverage MVs to speed up your AI/BI Dashboards.
MVs are best used to summarize large source tables and complex transformations, for example computing aggregates from raw data.
Step 1: Create a materialized view
Let's say you need to analyze the total revenue generated from orders, segmented by the date and priority level of each order, which has to be refreshed hourly. Your typical SQL query would be:
You could use the above query in your AI/BI Dashboard, but you would still be missing the data freshness requirement: you still need to find a way to run the query every hour. Also, if the data volume is large, creating a new table may take longer than one hour in which case you could never meet your data freshness requirement. Using a materialized view makes it easy to set a schedule and because the MV's may be incrementally refreshed, the updates can be very fast as compared to a full recompute of the table.
To use this query in a materialized view, simply wrap the query in a CREATE statement for a materialized view; put it in a DBSQL Query; and run it, like so:
The schedule operator tells the MV to update periodically, in this case the cron syntax specifies the MV to refresh every hour. Without that part, your dashboard would never pick up new data that has arrived since you created the materialized view. See the CREATE MATERIALIZED VIEW and quartz cron syntax documentation for more information. You may also see the query definition and other details about your MV in the catalog explorer.
Step 2: Give others access to your materialized view
You need to give others access to your materialized view so that they can use it when they're looking at your dashboard by following these steps:
- Give USE SCHEMA access to the catalog that the materialized view is in by running the following command:
GRANT USAGE ON SCHEMA demo.dss_acme TO user_or_role;
- Grant SELECT privileges on the MV with the following command:
GRANT SELECT ON demo.dss_acme.orders_mw TO user_or_role;
You may alternatively use the Databricks Catalog Explorer UI to manage permissions instead of running SQL queries. See the documentation for more information.
Step 3: Use the materialized view in your AI/BI Dashboard
Now you have an MV that has been created to pre-compute your original AI/BI Dashboard query. The last step is to update the existing dashboard to replace the SQL to query this new MV instead of the original one.
Go back to your AI/BI Dashboard to update the code as follows:
select * from demo.dss_acme.orders_mw;
Step 4: Enjoy that faster AI/BI Dashboard!
…and that should be it! If everything worked correctly, your dashboard should be fast now!
Conclusion
AI/BI Dashboards are a powerful visualization and reporting tool on the Databricks Data Intelligence Platform. Materialized Views are a new capability that can be used to significantly improve end-user response times for AI/BI Dashboards. With a few clicks, you'll be able to quickly create a faster end-user experience by combining MVs with AI/BI Dashboards.
You must be enrolled in the public preview to create and manage materialized views. You may request access to the public preview of Materialized Views by clicking on this link. Find documentation for Materialized Views and AI/BI Dashboards at these links.