Open Sourcing Unity Catalog

Type

On-Demand Video

Duration

5 minutes 31 seconds

Social

What you’ll learn

In this video, you will learn about Unity Catalog, the industry’s first open source catalog for data and AI governance across clouds, data formats and data platforms. Here are the most important pillars of the Unity Catalog vision:

  • Open source API and implementation: It is built on OpenAPI spec and an open source server implementation under Apache License 2.0. It is also compatible with Spark Hive Metastore API and the Apache Iceberg™ REST API.
  • Multi-format support: It is extensible and supports Delta Lake, Apache Iceberg via UniForm, Apache Parquet, CSV and all the other formats out there.
  • Multi-engine support: With its open APIs, Unity Catalog allows cataloged data to be read by virtually all compute engines.
  • Multimodal: It supports all your data and AI assets, including tables, files, functions and AI models.
  • Vibrant ecosystem: This is a community effort and we are extremely excited to be supported by Amazon Web Services, Microsoft Azure, Google Cloud, NVIDIA, Salesforce, DuckDB, LangChain, dbt Labs, Fivetran, Confluent, Unstructured, Onehouse, Immuta, Informatica and many more.

The project is available on GitHub today as the first step in our journey toward bringing the Unity Catalog vision into open source. Unity Catalog is hosted at LF AI & Data, an umbrella foundation of the Linux Foundation that supports open source innovation in artificial intelligence and data, and we are excited to work with the open source communities in the many years to come to realize this vision.