Skip to main content

Announcing General Availability of Predictive Optimization

Increase query performance by 2x and reduce storage costs by 50%, all without lifting a finger
Michelle Leon
Cindy Jiang
Vijayan Prabhakaran
Share this post

We're excited to announce the General Availability of Databricks Predictive Optimization. This capability intelligently optimizes your table data layouts for faster queries and reduced storage costs.

Predictive Optimization harnesses Unity Catalog and is powered by the Data Intelligence Engine to determine the best optimizations to perform on your data and run those operations automatically on serverless infrastructure.

Where previously data teams needed to manually manage maintenance operations, the Databricks Data Intelligence Platform does that for you, reducing management complexity and improving performance and cost-efficiency out of the box.

Get started today by enabling Predictive Optimization from your account console.

Data layout optimization is a hard problem

Proper table maintenance significantly improves query performance and cost efficiency by optimizing the data lake for your organization's unique needs. However, getting this right requires technical expertise, manual overhead, and continuous adjustments as your organization's data and use cases evolve.

Data engineering teams need to figure out:

  • Which optimizations to run?
  • Which tables should be optimized?
  • How frequently should the optimizations be run?

Once these questions are answered, teams must then manage the operational overhead of running these optimizations - e.g., scheduling jobs, diagnosing failures, and managing the underlying infrastructure.

Furthermore, this is not a one-time setup – teams must continuously update these jobs when data grows, new tables are added, and access patterns change. As data and AI use cases have exploded within organizations, many customers have shared that they are unable to keep up with optimizing tables created by expanding business needs.

Predictive Optimization solves data management challenges for you

With Predictive Optimization, Databricks takes care of all of this for you with AI and Unity Catalog, enabling you to focus on driving business value.

Intelligent analysis

Predictive Optimization intelligently determines the best schedule of optimizations by leveraging Unity Catalog and the Data Intelligence Engine. Our AI model takes your organization's query patterns, and combines them with factors such as data layout, table properties, and performance characteristics, to determine the most impactful optimizations to run.

For many customers, the impact and ROI is immediate. For example, the team at Plenitude, a large energy company, saw significant benefits soon after enabling Predictive Optimization.

"Databricks Predictive Optimization consistently helps the FinOps group minimize storage costs. We've immediately seen a 26% drop in storage costs, and we expect additional incremental savings going forward. The capability has enabled us to retire procedures, scripts, and manual maintenance operations, allowing us to achieve greater out-of-the-box scalability."
— Alessandro Caronia, Infrastructure Operations Manager and Simona Fiazza, End to End Operations Manager at Plenitude

Adaptive learning

Predictive Optimization also automatically learns and adjusts to your data usage patterns. The intelligence engine learns from your organization's usage over time. It ensures that your data is always stored in the most efficient layout, translating to cost savings and performance gains without the need for continuous manual intervention.

This self-driving system fully replaces manual solutions, like the one at Toloka AI, an AI data annotation platform.

"Thanks to Predictive Optimization (PO), we were able to decommission our DIY solution for table maintenance. PO is more efficient and cost-effective, as it optimizes only the tables that benefit from maintenance operations. PO simplifies our data platform, allowing for better allocation of resources and a more streamlined data management process."
— Nikita Bochkarev, Senior Data Engineer at Toloka AI

Automatic Liquid Clustering

New since Preview, Predictive Optimization will now automatically run OPTIMIZE on tables with Liquid Clustering, in addition to vacuum and compaction. You no longer have to schedule or determine the frequency of clustering – Predictive Optimization will cluster at an optimal cadence for better query performance.

Impact in numbers

Since launching as a Preview, Predictive Optimization has intelligently run optimizations over hundreds of thousands of tables comprising exabytes of data. These optimizations improve query performance by optimizing file size and layout on disk and have generated millions in annual storage savings for customers.

Preview customers like Anker have reported 2x improvements in query performance and 50% storage savings.

"Databricks' Predictive Optimizations intelligently optimized our Unity Catalog storage, which saved us 50% in annual storage costs while speeding up our queries by >2x. It learned to prioritize our largest and most-accessed tables. And, it did all of this automatically, saving our team valuable time."
— Shu Li, Data Engineering Lead at Anker

Coming Soon

Predictive Optimization will come with a built-in observability dashboard that provides insights into the optimizations performed and their impact on query performance and storage savings, making the benefits of Predictive Optimization transparent and measurable. If you want to look further under the hood, all operations are already logged in a system table, so you get full visibility.

Soon, Predictive Optimization will automatically collect statistics during supported write operations. Predictive Optimization will intelligently update statistics used to optimize query plans, by running ANALYZE in the background. These background operations are run as necessary, determined by smart logic that tracks when statistics are stale and when they are needed by the workload. If you are interested in participating in the Automatic Statistics Private Preview or in the initial phase of Public Preview, fill out this form and we will contact you.

In the near future, Predictive Optimization will be enabled by default across all Unity Catalog managed tables, so that you get optimized data layouts, efficient storage, and more, without lifting a finger. We are always adding new capabilities to improve your query performance and efficiency. Stay tuned for more over the next few months.

Get started today

Get started today by selecting Enabled next to Predictive Optimization in the account console under Settings > Feature enablement.

With a single click, Predictive Optimization's intelligence engine will begin making your data faster and more cost-effective. See the documentation for more information.

Try Databricks for free

Related posts

Introducing Predictive Optimization: Faster Queries, Cheaper Storage, No Sweat

Predictive Optimization intelligently optimizes your Lakehouse table data layouts for peak performance and cost-efficiency - without you needing to lift a finger.

Announcing General Availability of Liquid Clustering

May 22, 2024 by Cindy Jiang and Terry Kim in
We're excited to announce the General Availability of Delta Lake Liquid Clustering in the Databricks Data Intelligence Platform. Liquid Clustering is an innovative...

Data Intelligence Platforms

The observation that " software is eating the world " has shaped the modern tech industry. Today, software is ubiquitous in our lives...
See all Platform Blog posts