Skip to main content

Extending Databricks Unity Catalog with an Open Apache Hive Metastore API

Building an open and interoperable enterprise data catalog with unified governance
Share this post

Today, we are excited to announce the preview of a Hive Metastore (HMS) interface for Databricks Unity Catalog, which allows any software compatible with Apache Hive to connect to Unity Catalog. Apache Hive is the most widely supported catalog interface in the industry, usable in virtually every major computing platform. This feature lets organizations centralize their data management, discovery and governance in Unity Catalog and connect to it from a wide range of computing platforms, including Amazon Elastic MapReduce (EMR), open source Apache Spark, Amazon Athena, Presto, Trino, and others. It also ensures consistent data governance across these platforms.

To join this preview, contact your Databricks representative.

Hive Metastore interface for Unity Catalog
Hive Metastore interface for Unity Catalog

In today's rapidly evolving data management landscape, many organizations run multiple computing platforms and face the challenge of consistently implementing data discovery and governance across them. This often requires data teams to juggle multiple data catalogs and governance tools, leading to high operational overhead and difficulties in data discovery, access management, and auditing.

Databricks Unity Catalog is a unified governance solution for data, analytics and AI with simple features to discover data, manage permissions, audit accesses, track data lineage and quality, and share data across organizations. With the HMS interface, you can now connect any software that supports the industry-standard Apache Hive API to Unity, greatly simplifying governance across computing platforms.

In this blog, we will explore the advantages of this feature and how it can enhance your data management practices.

Why we built a Hive Metastore interface for Unity Catalog

Openness

In a diverse data ecosystem, openness plays a crucial role in seamless data integration and collaboration. Apache Hive is the most widely supported catalog API in the industry. Unity Catalog’s open HMS interface aligns with our open lakehouse platform strategy and provides a unified and standardized approach for accessing enterprise data and avoiding vendor lock-in. By using Unity Catalog with this open interface, organizations can simplify their data management architecture and ensure it can support both current and future tools.

Consistent Data Governance

Managing data across multiple platforms poses a significant challenge in maintaining consistent governance. The HMS interface in Unity Catalog effectively tackles this challenge, by extending enterprise-grade governance provided by Unity Catalog to a diverse range of compute platforms. This integration ensures consistent data compliance, enhances security measures, enforces robust access controls, and facilitates centralized auditing.

Easy Path to Modernize Legacy Workloads

For customers who have long standing legacy workloads running on different computing platforms, the HMS interface integration enables them to centralize metadata and access controls in Unity Catalog, encompassing both their legacy and Databricks workloads. This integration offers a straightforward path for migrating these workloads running on platforms such as Amazon EMR, while ensuring consistency throughout the migration process.

Cost Optimization

The HMS interface for Unity Catalog brings significant cost optimization benefits to organizations. Traditionally, organizations had to invest significant time and resources into managing multiple catalogs, which not only incurred additional costs but also introduced complexity and potential inconsistencies. Syncing data and policies between multiple catalogs is fragile and error-prone. This integration eliminates the need to manage, maintain, and synchronize separate data catalogs.

Conclusion

The HMS interface for Unity Catalog brings together the best of both worlds—open interoperability and enterprise-grade governance. By connecting Unity Catalog with diverse compute platforms, organizations can unlock enhanced data accessibility, improved governance, scalability, cost optimization, interoperability, and future-proofing, and eliminate the high operational cost of managing multiple data platforms today. This lets organizations focus on making the most of their data assets instead.

Contact your Databricks Representative to join this exciting preview!

Also, don’t miss more exciting updates about data and AI governance and register for our sessions at the Data and AI Summit.

Try Databricks for free

Related posts

Distributed Data Governance and Isolated Environments with Unity Catalog

Effective data governance is essential for any organization that relies on data, analytics and AI for its operations. In many organizations, there is...

Serving Up a Primer for Unity Catalog Onboarding

November 18, 2022 by Anindita Mahapatra and Mohan Mathews in
Introduction This blog is part of our Admin Essentials series, where we'll focus on topics important to those managing and maintaining Databricks environments...

Welcome Okera: Adopting an AI-centric approach to governance

For a decade, Databricks has focused on democratizing data and AI for organizations around the world. And since the debut of ChatGPT last...
See all Platform Blog posts