Databricks Announces Lakehouse Federation Capabilities in Unity Catalog, Providing Access to All Data
June 28, 2023
New Unity Catalog functionality allows customers to centrally discover, query, and govern all data, no matter where it lives
San Francisco, CA — June 28, 2023 — At the sold-out Data + AI Summit, Databricks, the Data and AI company, today announced new Lakehouse Federation capabilities that enable organizations to create a highly scalable and performant data mesh architecture with unified governance. These capabilities unify previously siloed data systems under the Databricks Lakehouse Platform. Lakehouse Federation in Unity Catalog will allow customers to discover, query, and govern data across all of their data platforms from within Databricks without moving or copying the data first. With this release, data silos effectively disappear within an organization and customers are able to extend the analytics capabilities of their unified lakehouse.
For most organizations, data is scattered across many operational and analytics systems. This fragmentation makes it difficult for data teams to discover available information and for compliance teams to maintain consistent governance. Additionally, it is costly and time-consuming to combine this data, because the integration processes depend on complex data engineering that delays data availability and ultimately slows down innovation.
New functionality within Unity Catalog — Databricks’ flagship solution for unified search and governance across data, analytics and AI — addresses these critical pain points and makes it simple for organizations to expose and govern siloed data systems as an extension of their lakehouse. New and future capabilities include:
- Query federation: New catalog and querying capabilities enable customers to effortlessly consolidate and map all their data assets from various platforms outside Databricks, including MySQL, PostgreSQL, Amazon Redshift, Snowflake, Azure SQL Database, Azure Synapse, Google’s BigQuery, and more. Users can now discover, secure, audit and access all of their data from a single interface, with a simplified and unified experience. Advanced query planning and caching ensure optimal query performance even when accessing multiple platforms with a single query.
- Governance beyond Databricks: With Unity Catalog, customers benefit from consistent access policies on tables, rows, columns and tags on any data asset registered in Unity Catalog. In the future, customers will also be able to define data access policies in Unity Catalog and seamlessly push those policies to other data warehouses for consistent enforcement wherever data is accessed, eliminating the need to maintain redundant policy definitions.
Databricks also recently announced a Hive Metastore (HMS) interface for Unity Catalog, enabling all software compatible with Apache Hive to connect with Unity Catalog. Now organizations can centralize their data management, discovery, and governance in Unity Catalog, and connect to it from a wide range of computing platforms, including Amazon EMR, Apache Spark, Amazon Athena, Presto, Trino, and others. The new interface eliminates the need for maintaining multiple data catalogs and ensures consistent data governance across these platforms.
The combination of these Lakehouse Federation enhancements provides customers with a consistent data serving and governance layer for their data mesh architecture, allowing for distributed domain ownership while reducing complicated data integration tasks, saving storage costs from multiple copies of the same data, and helping to improve their overall data security and governance posture.
“We’re cementing Databricks as the most open and flexible lakehouse platform for data, analytics and AI. We’re making it clear that we want to help you unify all your data, no matter where it lives, no matter the format,” said Matei Zaharia, Co-Founder and Chief Technologist at Databricks. “We’re excited to see what customers do with this new functionality. We’re giving organizations access to all of the data they need through one system, which will lead to more innovation — and the best part about that innovation is that it doesn’t sacrifice security. By enabling customers to easily apply the rules consistently across platforms and track data usage, we’ll help them meet compliance requirements while pushing their businesses forward.”
With the new Lakehouse Federation capabilities, Databricks customers will benefit from the following:
- Data democratization and discoverability: Databricks is giving users a common approach to securely discover and explore all data — structured and unstructured — no matter where it lives. A single query can now cross data silos.
- Faster access to data: Organizations can rapidly expose domain-owned data sources for data, analytics and AI use cases with no ingestion required. They can benefit from Databricks Lakehouse caching and optimization to transparently accelerate interactive queries efficiently and simply.
- Unified governance across all data sources: The enhancements make it possible for users to only need one permission model for their entire data estates, providing unified data governance with built-in data lineage and auditability across their entire data mesh.
“Bayer’s vision is ‘health for all, hunger for none.’ Our PlantBalance sensor and data solution helps glasshouse growers to continuously monitor plant behavior and enable early detection of growth issues and fast feedback on their crop decisions. This makes fact-based analysis and data-driven decisions a lot easier. With Databricks’ Lakehouse Federation, data scientists and business users alike can now access diverse data sources through a uniform user interface with consistent permissions managed in one place,” said Jelle de Jong, Tech Lead at Bayer. “We’re continuously standardizing our data format to Delta Lake, but we’re thrilled that query federation has allowed us to iterate with agility before investing in data extraction.”
“At SEGA Europe, we aim to entertain the world with creative, innovative experiences — and being able to quickly discover and leverage data helps us deliver the best possible experience for our players. Lakehouse Federation gives us the ability to combine data — like usage, sales and game telemetry data — from multiple sources, across multiple clouds and view and query it all from one place. Now we leave the data in the original data source, but can utilize it from the Databricks Lakehouse,” said Felix Baker, Head of Data Services at SEGA Europe. “Since we no longer have to move our finance data, which is refreshed frequently, it saves us valuable time that can be focused on giving our consumers the best possible gaming experience.”
"The energy industry is going through a massive transition and data is an important enabler to make our business more effective and efficient. Being able to manage our data is essential, and Databricks has always understood and helped prioritize that for us," said Bryce Bartmann, Chief Digital Technology Advisor at Shell. "Lakehouse Federation has enabled us to move more quickly to consolidate our existing data landscape into Unity Catalog. This makes Shell's data governance simpler – more datasets become discoverable in one place, authentication is standardized and querying across datasets with a common programming language becomes possible. Ultimately, it makes us more effective in navigating the transformation happening in the energy sector today."
Databricks continues to expand the Lakehouse Platform, recently announcing Lakehouse Apps and the general availability of Databricks Marketplace, Delta Lake 3.0, LakehouseIQ, and a suite of data-centric AI tools for building and governing LLMs on the lakehouse.
Availability
Lakehouse Federation and the Hive Metastore interface will be available for customers in public preview soon. Databricks customers can sign up for the waitlist now at https://databricks.com/qfpreview.
To learn more about Databricks’ enhancements to Unity Catalog, watch the Data + AI Summit live: https://www.databricks.com/dataaisummit/watch
About Databricks
Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, and over 50% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Delta Lake, Apache Spark™, and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems. To learn more, follow Databricks on Twitter, LinkedIn, and Facebook.
Contact: [email protected]
Safe Harbor Statement
This information is provided to outline Databricks’ general product direction and is for informational purposes only. Customers who purchase Databricks services should make their purchase decisions relying solely upon services, features, and functions that are currently available. Unreleased features or functionality described in forward-looking statements are subject to change at Databricks discretion and may not be delivered as planned or at all.