At Data and AI Summit, we announced the general availability of Databricks Lakehouse Monitoring. Our unified approach to monitoring data and AI allows you to easily profile, diagnose, and enforce quality directly in the Databricks Data Intelligence Platform. Built directly on Unity Catalog, Lakehouse Monitoring (AWS | Azure) requires no additional tools or complexity. By discovering quality issues before downstream processes are impacted, your organization can democratize access and restore trust in your data.
In today’s data-driven world, high-quality data and models are essential for building trust, creating autonomy, and driving business success. Yet, quality issues often go unnoticed until it’s too late.
Does this scenario sound familiar? Your pipeline seems to be running smoothly until a data analyst escalates that the downstream data is corrupted. Or for machine learning, you don’t realize your model needs retraining until performance issues become glaringly obvious in production. Now your team is faced with weeks of debugging and rolling back changes! This operational overhead not only slows down the delivery of core business needs but also raises concerns that critical decisions may have been made on faulty data. To prevent these issues, organizations need a quality monitoring solution.
With Lakehouse Monitoring, it’s easy to get started and scale quality across your data and AI. Lakehouse Monitoring is built on Unity Catalog so teams can track quality alongside governance, without the hassle of integrating disparate tools. Here’s what your organization can achieve with quality directly in the Databricks Data Intelligence Platform:
Learn how Lakehouse Monitoring can improve the reliability of your data and AI, while building trust, autonomy, and business value in your organization.
Lakehouse Monitoring offers automated profiling for any Delta Table (AWS | Azure) in Unity Catalog out-of-the-box. It creates two metric tables (AWS | Azure) in your account—one for profile metrics and another for drift metrics. For Inference Tables (AWS | Azure), representing model inputs and outputs, you'll also get model performance and drift metrics. As a table-centric solution, Lakehouse Monitoring makes it simple and scalable to monitor the quality of your entire data and AI estate.
Leveraging the computed metrics, Lakehouse Monitoring automatically generates a dashboard plotting trends and anomalies over time. By visualizing key metrics such as count, percent nulls, numerical distribution change, and categorical distribution change over time, Lakehouse Monitoring delivers insights and identifies problematic columns. If you’re monitoring a ML model, you can track metrics like accuracy, F1, precision, and recall to identify when the model needs retraining. With Lakehouse Monitoring, quality issues are uncovered without hassle, ensuring your data and models remain reliable and effective.
“Lakehouse Monitoring has been a game changer. It helps us solve the issue of data quality directly in the platform... it's like the heartbeat of the system. Our data scientists are excited they can finally understand data quality without having to jump through hoops.” – Yannis Katsanos, Director of Data Science, Operations and Innovation at Ecolab
Lakehouse Monitoring is fully customizable to suit your business needs. Here's how you can tailor it further to fit your use case:
Next, Lakehouse Monitoring further ensures data and model quality by shifting from reactive processes to proactive alerting. With our new Expectations feature, you’ll get notified of quality issues as they arise.
Databricks brings quality closer to your data execution, allowing you to detect, prevent and resolve issues directly within your pipelines.
Today, you can set data quality Expectations (AWS | Azure) on materialized views and streaming tables to enforce row-level constraints, such as dropping null records. Expectations allow you to surface issues ahead of time so you can take action before it impacts downstream consumers. We plan to unify expectations in Databricks, allowing you to set quality rules across any table in Unity Catalog—including Delta Tables (AWS | Azure), Streaming Tables (AWS | Azure), and Materialized Views (AWS | Azure). This will help prevent common problems like duplicates, high percentages of null values, distributional changes in your data, and will indicate when your model needs retraining.
To extend expectations to Delta tables, we’re adding the following capabilities in the coming months:
Don’t miss out on what’s to come and join our Preview by following this link.
To get started with Lakehouse Monitoring, simply head to the Quality tab of any table in Unity Catalog and click “Get Started”. There are 3 profile types (AWS | Azure) to choose from:
💡Best practices tip: To monitor at scale, we recommend enabling Change Data Feed (CDF) (AWS | Azure) on your table. This gives you incremental processing which means we only process the newly appended data to the table rather than re-processing the entire table every refresh. As a result, execution is more efficient and helps you save on costs as you scale monitoring across many tables. Note that this feature is only available for Time series or Inference Profiles since Snapshot requires a full scan of the table everytime the monitor is refreshed.
To learn more or try out Lakehouse Monitoring for yourself, check out our product links below:
By monitoring, enforcing, and democratizing data quality, we’re empowering teams to establish trust and create autonomy with their data. Bring the same reliability to your organization and get started with Databricks Lakehouse Monitoring (AWS | Azure) today.