Announcing General Availability of Lakehouse Federation
Today, we are excited to announce that Lakehouse Federation in Unity Catalog is now Generally Available (GA) across AWS, Azure, and GCP! Lakehouse Federation allows you to discover, query, and govern all your data in one place. With this GA release, you can expect enhanced stability, security, and enterprise readiness for your federated workloads.
In this blog post, we go over the GA capabilities of Lakehouse Federation, explore how it’s powering agile analytics at the world’s leading companies and discuss what’s next.
Lakehouse Federation Primer
Organizations worldwide, regardless of size or industry, are leveraging data and AI to drive innovation. However, due to historical, organizational, or technological reasons, data often remains dispersed across multiple operational and analytical systems. This fragmentation leads to several challenges:
- Difficulty in discovering and accessing all data
- Slow execution due to engineering bottlenecks
- Weak compliance across siloed systems
Lakehouse Federation addresses these critical pain points and makes it simple for organizations to expose, query, and govern siloed data systems as an extension of their lakehouse. With these new capabilities, you can:
- Build a unified view of your data estate: Automatically classify and discover all your data, structured and unstructured, in one place and enable everyone in your organization to securely access and explore all the data available at their fingertips - no matter where it lives.
- Query and combine all data efficiently with a single engine: Accelerate ad hoc analysis and prototyping across all your data, analytics and AI use cases on the most complete data - no ingestion required - with a single engine. Advanced query planning across sources and caching ensures optimal query performance even when accessing and combining data from multiple platforms with a single query.
- Safeguard data across data sources: Use one permission model to set and apply access rules and safeguard all your data across data sources. Apply rules like row and column level security, tag-based policies, centralized auditing consistently across platforms, track data usage, and meet compliance requirements with built-in data lineage and auditability.
Over 5,000 Databricks customers are leveraging Lakehouse Federation to unify their data estates, ensuring consistent data discovery and governance.
"Lakehouse Federation has allowed us to combine all our data assets across multiple data warehouses and databases under Unity Catalog, simplifying data discovery and access management. This unlocks a variety of use cases, including ingest and ad hoc querying, making our analytics easier than ever."— Alexander Booth, Assistant Director of Research with the Texas Rangers
General Availability
We’re excited to announce General Availability for MySQL, PostgreSQL, Amazon Redshift, Snowflake, Azure SQL Database, SQL Server and Azure Synapse connectors.
This release marks an important milestone across a few areas:
- Improved performance: With this release, we’ve significantly increased the coverage of expressions and operators that we can push down (i.e., delegate to the underlying database) to SQL Server, Postgres, MySQL, Snowflake, Redshift, and Synapse connections. In practice, this will mean lower latency queries and faster Materialized View (MV) creation, all without requiring users to modify their queries.
- Enhanced stability and observability: We’ve updated our federation and pushdown framework to be more resilient and handle failure scenarios without impacting user workloads.
We’ve also introduced improved Query Profiles to support federation-specific metadata and statistics, giving administrators better ways to monitor and audit. - New security options: Starting with Azure ecosystem sources and Snowflake, we’re adding support for passwordless authentication options, Azure AD/Entra ID support for Azure SQL, and OAuth support for Snowflake. In the upcoming months, we’ll also be building out similar capabilities for the AWS/Google ecosystems.
"Lakehouse Federation has helped us consolidate our data landscape with consistent governance in one place and generate significant operational efficiency gains. Data insights and quality are now seamlessly integrated, allowing us to focus on providing our clients with the best insights to maximizing value from their advertising investments."— Bob Wuisman, Global Head of Production at Ebiquity plc.
What's next?
Catalog Federation
Discover, govern and access data from Hive Metastore (HMS) and AWS Glue with Lakehouse Federation. With Catalog Federation, you’ll be able to easily mount any external (or internal Databricks) HMS as a foreign catalog in Unity Catalog.
For users of Databricks HMS (internal), this is a simple and straightforward way to get started with Unity Catalog and benefit from the unified governance capabilities provided by Unity Catalog.
For users of external HMS and AWS Glue, it provides a tightly-integrated way to access external metastore data right from Unity Catalog without changing your workflows.
Catalog Federation is currently in Private Preview.
New Connectors
Expanding the list of supported data sources for Lakehouse Federation remains a top priority in our mission to help customers unify their data estates. We are excited to announce that Google BigQuery, completing Data warehouse federation support across all three major cloud providers, and Salesforce Data Cloud connectors are now in Public Preview.
Oracle and Teradata connectors will be available for preview soon.
High Throughput Data Warehouse Connections
To provide a faster query experience against data warehouses, which tend to hold larger tables, we’re adding capabilities to do automatic high-throughput data transfers.
In the future, starting with Amazon Redshift & Snowflake connectors, you’ll be able to query & materialize tables from data warehouses quickly. Behind the scenes, Lakehouse Federation will leverage faster/bulk APIs (e.g. offload to object storage or staging location in parallel) and fetch these results in parallel (no driver bottleneck). All without any user intervention!
Sharing for Lakehouse Federation
Finally, sharing Lakehouse Federation data is set to become much easier. The upcoming Delta Sharing integration will allow customers to share federated tables externally without the recipients needing access to Databricks or the underlying data system. This will streamline data sharing by eliminating the need for redundant copies across different systems.
Get Started
- Read our documentation (AWS, Azure, GCP) to get started with the Lakehouse Federation
- Watch the Lakehouse Federation session from Data and AI Summit 2024 for a deep dive into the Lakehouse Federation
- Watch Matei Zaharia, co-founder and Chief Technology Officer at Databricks, deliver the keynote address at the Data+AI Summit 2023 to learn more about the latest announcements in Unity Catalog!