Today, we are excited to announce that Lakehouse Federation in Unity Catalog is now Generally Available (GA) across AWS, Azure, and GCP! Lakehouse Federation allows you to discover, query, and govern all your data in one place. With this GA release, you can expect enhanced stability, security, and enterprise readiness for your federated workloads.
In this blog post, we go over the GA capabilities of Lakehouse Federation, explore how it’s powering agile analytics at the world’s leading companies and discuss what’s next.
Organizations worldwide, regardless of size or industry, are leveraging data and AI to drive innovation. However, due to historical, organizational, or technological reasons, data often remains dispersed across multiple operational and analytical systems. This fragmentation leads to several challenges:
Lakehouse Federation addresses these critical pain points and makes it simple for organizations to expose, query, and govern siloed data systems as an extension of their lakehouse. With these new capabilities, you can:
Over 5,000 Databricks customers are leveraging Lakehouse Federation to unify their data estates, ensuring consistent data discovery and governance.
"Lakehouse Federation has allowed us to combine all our data assets across multiple data warehouses and databases under Unity Catalog, simplifying data discovery and access management. This unlocks a variety of use cases, including ingest and ad hoc querying, making our analytics easier than ever."— Alexander Booth, Assistant Director of Research with the Texas Rangers
We’re excited to announce General Availability for MySQL, PostgreSQL, Amazon Redshift, Snowflake, Azure SQL Database, SQL Server and Azure Synapse connectors.
This release marks an important milestone across a few areas:
"Lakehouse Federation has helped us consolidate our data landscape with consistent governance in one place and generate significant operational efficiency gains. Data insights and quality are now seamlessly integrated, allowing us to focus on providing our clients with the best insights to maximizing value from their advertising investments."— Bob Wuisman, Global Head of Production at Ebiquity plc.
Discover, govern and access data from Hive Metastore (HMS) and AWS Glue with Lakehouse Federation. With Catalog Federation, you’ll be able to easily mount any external (or internal Databricks) HMS as a foreign catalog in Unity Catalog.
For users of Databricks HMS (internal), this is a simple and straightforward way to get started with Unity Catalog and benefit from the unified governance capabilities provided by Unity Catalog.
For users of external HMS and AWS Glue, it provides a tightly-integrated way to access external metastore data right from Unity Catalog without changing your workflows.
Catalog Federation is currently in Private Preview.
Expanding the list of supported data sources for Lakehouse Federation remains a top priority in our mission to help customers unify their data estates. We are excited to announce that Google BigQuery, completing Data warehouse federation support across all three major cloud providers, and Salesforce Data Cloud connectors are now in Public Preview.
Oracle and Teradata connectors will be available for preview soon.
To provide a faster query experience against data warehouses, which tend to hold larger tables, we’re adding capabilities to do automatic high-throughput data transfers.
In the future, starting with Amazon Redshift & Snowflake connectors, you’ll be able to query & materialize tables from data warehouses quickly. Behind the scenes, Lakehouse Federation will leverage faster/bulk APIs (e.g. offload to object storage or staging location in parallel) and fetch these results in parallel (no driver bottleneck). All without any user intervention!
Finally, sharing Lakehouse Federation data is set to become much easier. The upcoming Delta Sharing integration will allow customers to share federated tables externally without the recipients needing access to Databricks or the underlying data system. This will streamline data sharing by eliminating the need for redundant copies across different systems.