Skip to main content

Announcing Public Preview of AI Generated Documentation In Databricks Unity Catalog

Streamlining data documentation and discovery with Generative AI
Share this post

Today, we are excited to announce the public preview of AI generated documentation in Databricks Unity Catalog. This feature leverages generative AI to simplify the documentation, curation, and discovery of your organization's data and AI assets by automating the addition of descriptions and comments for tables and columns. 

In today's data-driven landscape, where data is the bedrock of informed decision-making, establishing a solid foundation for teamwork hinges on seamless data discoverability and clarity. Yet, data teams often grapple with a crucial challenge: the absence of comprehensive data descriptions, creating a lack of contextual understanding. This shortfall impedes users from fully harnessing data's potential, underscoring the need for simplified data descriptions to bridge these gaps.

Furthermore, the absence of adequate metadata and descriptions for tables and columns compounds the issue, resulting in several challenges:

  • Data ambiguity: The lack of clarity surrounding the purpose and content of tables and columns can significantly hinder users' decision-making capabilities.
  • Manual burden: Data owners shoulder the responsibility of manually appending descriptions and comments to furnish essential context for their assets, a crucial requirement for fostering collaboration among teams.
  • Inefficient data exploration: Users frequently find themselves compelled to rely on complex queries to extract insights from the data, leading to the consumption of valuable time and resources.
  • Poor data quality: Inadequate or inaccurate documentation can give rise to misunderstandings, data errors, and compromised data quality. Remarkably, It is estimated by IDC that data analysts expend up to 80% of their time preparing and cleaning data, often stemming from inadequate data documentation, including missing descriptions.

Enhancing efficiency and accelerating insights with AI generated documentation in Unity Catalog

To address these challenges and assist in scenarios where data owners might lack sufficient context to add descriptions, Unity Catalog now suggests descriptions for tables and columns. Users can opt to accept these suggestions or adjust them as needed, ensuring an assistive and user-friendly experience. 

How it Works

  • Data exploration: When users navigate to the Catalog Explorer and access a table they own or manage, they will be presented with auto-generated metadata for the table and its columns.

 

  • User review and editing: Users will have the ability to review, edit, or accept the generated metadata. This step ensures that the descriptions align with the specific use case and domain knowledge.

  • Metadata storage: Once the user approves the generated documentation, it is saved within Unity Catalog. This documentation can then be used to support data consumers in various ways such as efficient search based on the auto-generated description. 

Using AI-powered documentation in Unity Catalog offers several advantages:

  • Time and resource efficiency: The automation of documentation generation saves time and reduces the manual effort required for data description.
  • Simplified data exploration: Users can quickly understand the content and purpose of tables and columns, reducing the need for complex queries
  • Enhanced data clarity: Accurate and comprehensive descriptions help ensure data clarity and prevent misunderstandings.
  • Improving Databricks search The generated metadata supports table search within your workspace, improving the discoverability of relevant data for all your data use cases.
  • User control: Users retain control over the documentation process, with the ability to edit and customize descriptions to better match their specific requirements.

AI for governance in Unity Catalog

Unity Catalog allows organizations to securely discover, access, monitor, and collaborate on files, tables, ML models, notebooks, and dashboards across any data platform or cloud, while also leveraging AI to boost productivity and unlock the full potential of the lakehouse environment. This AI-generated documentation is an integral component of our comprehensive product roadmap, aimed at leveraging the power of AI to enhance governance workflows and operational efficiency. With features such as LakehouseIQ and Lakehouse Monitoring, organizations gain powerful data intelligence and monitoring capabilities. Additionally, Databricks Assistant, a context-aware AI assistant, further enhances user experiences, making operations more intuitive and responsive. This strategic integration of AI technologies in the Unity Catalog underscores our commitment to innovation and continuous improvement in delivering state-of-the-art data and AI governance solution, natively integrated with the Lakehouse Platform.

Getting started

By embracing Unity Catalog as the cornerstone of your Lakehouse architecture, you can unlock the power of a flexible and scalable governance implementation that spans your entire data and AI estate. It's very easy to get started! If you already have Unity Catalog enabled in your workspace, navigate to tables you own or manage in Catalog Explorer. For more information, follow the Unity Catalog guides available for AWS, Azure, and GCP.

 

Try Databricks for free

Related posts

What’s new with Unity Catalog at Data and AI Summit 2023

The fundamental principles of governance – accountability, compliance, quality, and transparency – that are essential for data management have now become equally imperative...

Introducing LakehouseIQ: The AI-Powered Engine that Uniquely Understands Your Business

Today, we are thrilled to announce LakehouseIQ, a knowledge engine that learns the unique nuances of your business and data to power natural...

Introducing Databricks Assistant, a context-aware AI assistant

Today, we are excited to announce the public preview of Databricks Assistant, a context-aware AI assistant, available natively in Databricks Notebooks, SQL editor...
See all Platform Blog posts