Responsible AI with the Databricks Data Intelligence Platform

Deliver AI that you can trust

Published: July 28, 2024

by Lexy Kassan, Scott Starbird, Omar Khawaja, Jonathan Frankle, Sachin Thakur, Arun Pamulapati, Kelly Albano and Moe Steller

The transformative potential of artificial intelligence (AI) is undeniable. From productivity efficiency, to cost savings, and improved decision-making across all industries, AI is revolutionizing value chains. The advent of Generative AI since late 2022, particularly with the launch of ChatGPT, has further ignited market interest and enthusiasm for this technology. According to McKinsey and Co., the economic potential of Generative AI, including use cases and worker productivity enabled by AI, could add between $17 trillion and $26 trillion to the global economy.

As a result, more and more organizations are now focusing on implementing AI as a core tenet of their business strategy to build a competitive advantage. Goldman Sachs Economic Research estimates that AI investment could approach $100 billion in the U.S. and $200 billion globally by 2025.

However, as organizations embrace AI, it is crucial to prioritize responsible AI practices that cover quality, security, and governance to establish trust in their AI goals. According to Gartner, AI trust, risk, and security management is the #1 top strategy trend in 2024 that will factor into business and technology decisions. By 2026, AI models from organizations that operationalize AI transparency, trust, and security will achieve a 50% increase in terms of adoption, business goals, and user acceptance and realization of business goals.

Moreover, as AI regulations are picking up globally, organizations should start looking at meeting compliance with these regulations as part of their responsible AI strategy. In our previous blog on AI regulations, we discussed the recent surge in AI policymaking in the U.S. and other countries, emphasizing the common regulatory themes emerging worldwide. In this blog we will deep dive into how the Databricks Data Intelligence Platform can help customers meet emerging obligations on responsible AI.

Core challenges in responsible AI: Trust, Security, and Governance

Lack of visibility into model quality: Insufficient visibility into the consequences of AI models has become a prevailing challenge. Companies grapple with a lack of trust in the reliability of AI models to consistently deliver outcomes that are safe and fair for their users. Without clear insights into how these models function and the potential impacts of their decisions, organizations struggle to build and maintain confidence in AI-driven solutions.

Inadequate security safeguards: Interactions with AI models expand an organization's attack surface by providing a new way for bad actors to interact with data. Generative AI is particularly problematic, as a lack of security safeguards can allow applications like chatbots to reveal (and in some cases to potentially modify) sensitive data and proprietary intellectual property. This vulnerability exposes organizations to significant risks, including data breaches and intellectual property theft, necessitating robust security measures to protect against malicious activities.

Siloed governance: Organizations frequently deploy separate data and AI platforms, creating governance silos that result in limited visibility and explainability of AI models. This disjointed approach leads to inadequate cataloging, monitoring, and auditing of AI models, impeding the ability to guarantee their appropriate use. Furthermore, a lack of data lineage complicates understanding of which data is being utilized for AI models and obstructs effective oversight. Unified governance frameworks are essential to ensure that AI models are transparent, traceable, and accountable, facilitating better management and compliance.

Building AI responsibly with the Databricks Data Intelligence Platform

Responsible AI practices are essential to ensure that AI systems are high-quality, safe, and well-governed. Quality considerations should be at the forefront of AI development, ensuring that AI systems avoid bias, and are validated for applicability and appropriateness in their intended use cases. Security measures should be implemented to protect AI systems from cyber threats and data breaches. Governance frameworks should be established to promote accountability, transparency, and compliance with relevant laws and regulations.

Databricks believes that the advancement of AI relies on building trust in intelligent applications by following responsible practices in the development and use of AI. This requires that every organization has ownership and control over their data and AI models with comprehensive monitoring, privacy controls and governance throughout the AI development and deployment. To achieve this mission, the Databricks Data Intelligence Platform allows you to unify data, model training, management, monitoring, and governance of the entire AI lifecycle. This unified approach empowers organizations to meet responsible AI objectives that deliver model quality, provide more secure applications, and help maintain compliance with regulatory standards.

End-to-end quality monitoring for data and AI

Responsible AI development and deployment hinges on establishing a comprehensive quality monitoring framework that spans the entire lifecycle of AI systems. This framework is essential for ensuring that AI models remain trustworthy and aligned with their intended use cases from development through post-deployment. To achieve this, three critical aspects of model quality must be addressed: transparency, effectiveness, and reliability.

Transparency is fundamental to building confidence in AI systems and meeting regulatory requirements. It involves making models explainable and interpretable, allowing stakeholders to understand how decisions are made.
Effectiveness, on the other hand, focuses on the model's ability to produce accurate and appropriate outputs. During development, it is essential to track data quality, model performance metrics, and potential biases to identify and mitigate issues early on.
Reliability ensures consistent performance over time, requiring continuous monitoring to prevent model degradation and avoid business disruptions. Monitoring involves tracking potential issues, such as changes in predictions, data distribution shifts, and performance degradation, allowing for quick intervention. Redeployment ensures that, after model updates or replacements, the business maintains high-quality outputs without downtime. Together, monitoring and redeployment are essential to sustaining model quality and reliability.

AI Systems Component — Foundational components of a generic data-centric AI system

Transparency in AI: Confident deployment with comprehensive documentation

Automated data lineage: Tracing the origin and transformations of data is essential for compliance checks and detecting training data poisoning in AI lifecycle management. Delta Live Tables, built on Delta Lake, offers efficient and reliable data processing and transformation. A key feature of Delta Live Tables is data lineage tracking, which allows you to trace data origins and transformations throughout the pipeline. This visibility helps combat training data poisoning by enabling data versioning and anomaly detection to identify and mitigate issues. Delta Live Tables integrates seamlessly with MLflow and Unity Catalog, enabling you to track data lineage from initial sources to trained models. This integration supports reproducible data pipelines, ensuring consistent transformations across development, staging, and production environments, which is crucial for maintaining model accuracy and reliability. Furthermore, lineage information from Delta Live Tables facilitates automated compliance checks to ensure adherence to regulatory requirements and responsible AI practices.

Feature engineering: Features are curated input data used to train the model. The Databricks Feature Store provides a centralized repository for curating features, enabling reproducible feature computation and improving model accuracy. This centralization ensures consistent feature management and tracks feature lineage, guaranteeing that the same feature values used during training are used during inference. The feature store integrates natively with other Databricks components like Unity Catalog, allowing end-to-end lineage tracking from data sources to feature engineering, model creation, and deployment. As teams move to production, maintaining consistency between data sources for batch feature computation and real-time inference can be challenging. Unity Catalog automatically tracks and displays the tables and functions used for model creation when training models with features from the feature store along with the feature version.

Experiment tracking: Databricks managed MLflow offers comprehensive experiment tracking capabilities, logging all relevant metadata associated with AI experiments, including source code, data, models, and results. This tracking provides valuable insights into model performance, guiding improvements and iterations during development. MLflow supports functionalities such as experiment tracking, run management, and notebook revision capture, enabling teams to measure and analyze ML model training runs effectively. It allows the logging of model training artifacts like datasets, models, hyperparameters, and evaluation metrics, both standard and custom-defined, including fairness and bias checks. The MLflow Tracking component logs source properties, parameters, metrics, tags, and artifacts related to training an ML model, providing a comprehensive view of the experiment. Databricks Autologging extends this capability by enabling automatic, no-code experiment tracking for ML training sessions on the Databricks Platform. Combined with Delta Live Tables for data lineage tracking, MLflow offers versioning and anomaly detection, allowing teams to combat training data poisoning and ensure compliance with regulatory and responsible AI obligations.

AI-powered documentation: Databricks offers AI-powered documentation for data and ML models in Unity Catalog. This functionality streamlines the documentation process by utilizing large language models (LLMs) to automatically create documentation for tables, ML models, and columns within Unity Catalog. It also provides textual responses to natural language queries about your data, thereby simplifying the documentation of the data utilized by your model.

Traceable compound AI systems: Bringing together the power and user-friendly interface of generative AI with the explainable, reproducible results of traditional machine learning or discrete functions provides a more transparent and reliable overall AI architecture. Tools are a means by which LLMs can interact with other systems and applications in codified ways like calling APIs or executing existing queries. Mosaic AI Tools Catalog lets organizations govern, share, and register tools using Databricks Unity Catalog for use in their compound AI systems. Further, generative AI models registered in MLflow, including tool-enabled LLMs, can be easily traced for full explainability. Each step of retrieval, tool usage and response, and references are available for every logged request/call.

AI Effectiveness: Automating evaluation and selection of AI models for appropriate use

Model evaluation: Model evaluation is a critical component of the ML lifecycle and highly relevant to meeting applicable AI regulatory obligations. Databricks Managed MLflow plays a critical role in model development by offering insights into the reasons behind a model's performance and guiding improvements and iterations. MLflow offers many industry-standard native evaluation metrics for classical ML algorithms and LLMs and also facilitates the use of custom evaluation metrics. Databricks Managed MLflow provides a number of features to assist in evaluating and calibrating models, including the MLflow Model Evaluation API, which helps with model and dataset evaluation, and MLflow Tracking which lets a user log source properties, parameters, metrics, tags, and artifacts related to training a ML model. Used with lineage tracking, Managed MLflow also provides versioning and anomaly detection. Databricks Autologging is a no-code solution that extends MLflow Tracking’s automatic logging to deliver automatic experiment tracking for ML training sessions on Databricks. MLflow Tracking also tracks model files so a user can easily log them to the MLflow Model Registry and deploy them for real-time scoring with Model Serving.

LLM evaluation and guardrails: In addition to MLflow, the Databricks Data Intelligence Platform offers an AI playground for LLM evaluation as part of Databricks Mosaic AI. This allows you to test and compare LLM responses, helping you determine which foundation model works best for your environment and use case. You can enhance these foundation models with filters using our AI guardrails to protect against interaction with toxic or unsafe content. To filter on custom categories, define custom functions using Databricks Feature Serving (AWS | Azure) for custom pre-and-post-processing. For example, to filter data that your company considers sensitive from model inputs and outputs, wrap any business rule or function and deploy it as an endpoint using Feature Serving. Additionally, safeguard models like Llama Guard and Llama Guard 2 are available on the Databricks Marketplace. These open source tools are free to use, helping you create an LLM that acts as both a judge and a guardrail against inappropriate responses. The Databricks Mosaic Inference platform allows users to reuse pretrained generative AI models and adapt them to new tasks, enabling transfer learning to build accurate and reliable models with smaller amounts of training data, thus improving the model's generalization and accuracy. Mosaic Inference offers a range of model types and sizes. To limit hallucinations and similar model risks, customers can build smaller, performant models that they control in their own environment on their own data. Full control over data provenance reduces the risk of models hallucinating based on erroneous knowledge learned during pretraining. It also reduces the likelihood of hallucinations by constraining the language on which the model is trained to representative, relevant samples. When selecting, training, or fine-tuning a model, customers can also take advantage of the built-in Mosaic Eval Gauntlet benchmark suite, which runs models through an array of industry-standard language evaluation tasks to benchmark model performance across multiple dimensions.

Feature evaluation: The “features” of a model are paramount to its quality, accuracy, and reliability. They directly impact risk and are therefore of utmost importance when seeking to meet AI regulatory obligations. Databricks feature store ensures reproducible feature computation, essential for addressing online/offline skew in ML deployments. This skew, arising from discrepancies between training and inference data sources, can significantly impact model accuracy. Databricks feature store mitigates this issue by tracking feature lineage and facilitating collaboration across teams managing feature computation and ML models in production.

AI Reliability: Ensuring seamless monitoring and iteration

Model monitoring: Monitoring models in production is crucial for ensuring ongoing quality and reliability. With Databricks Lakehouse Monitoring, you can continuously assess the performance of your models, scanning application outputs to detect any problematic content. This includes monitoring for fairness and bias in sensitive AI applications like classification models. The platform helps quickly identify issues such as model drift due to outdated data pipelines or unexpected model behavior. Key features include customizable dashboards, real-time alerts, flexible observation time frames, audit logs, and the option to define custom metrics. Additionally, it offers PII detection for enhanced data security. Lakehouse Monitoring, in conjunction with lineage tracking from Unity Catalog, accelerates threat response, facilitates faster issue resolution, and enables thorough root cause analysis. Databricks Inference Tables automatically capture and log incoming requests and model responses as Delta tables in Unity Catalog. This data is invaluable for monitoring, debugging, and optimizing ML models post-deployment.

Additionally, the Mosaic Training platform, including the Mosaic LLM Foundry suite of training tools, and the Databricks RAG Studio tools, can be used to assess and tune models post-launch to mitigate identified issues. The Patronus AI EnterprisePII automated AI evaluation tool included in the LLM Foundry can be useful to detect the presence of a customer’s business sensitive information as part of model security post-release. Toxicity screening and scoring are also incorporated within RAG Studio. The Mosaic Eval Gauntlet benchmarking tool can be used to assess model performance on an ongoing basis.

Model serving and iteration: Databricks Model Serving, a serverless solution, provides a unified interface for deploying, governing, and querying AI models with secure-by-default REST API endpoints. The Model Serving UI enables centralized management of all model endpoints, including those hosted externally. This platform supports live A/B testing, allowing you to compare model performance and switch to more effective models seamlessly. Automatic version tracking ensures that your endpoints remain stable while iterating on your models behind the scenes.

Additionally, Databricks AI Gateway centralizes governance, credential management, and rate limits for model APIs, including SaaS LLMs, through Gateway Routes (with each route representing a model from a specific vendor). AI Gateway offers a stable endpoint interface, enabling smooth model updates and testing without disrupting business operations.

Unified security for data and AI

With the rise of AI, concerns about security are also increasing. In fact, 80% of data experts believe AI increases data security challenges. Recognizing this, security has become a foundational element of the Databricks Data Intelligence Platform. We offer robust security controls to safeguard your data and AI operations, including encryption, network controls, data governance, and auditing. These protections extend throughout the entire AI lifecycle—from data and model operations to model serving.

To help our customers navigate the ever-evolving landscape of AI security threats, Databricks has developed a comprehensive list of 55 potential risks associated with each of the twelve components of an end-to-end AI system. In response to these identified risks, we provide detailed and actionable recommendations as part of the Databricks AI Security Framework (DASF) to mitigate them using the Databricks Data Intelligence Platform. By leveraging these robust security measures and risk mitigation strategies, you can confidently build, deploy, and manage your AI systems while maintaining the highest levels of security.

While many of the risks associated with AI may, on the surface, seem unrelated to cybersecurity (e.g., fairness, transparency, reliability, etc.), canonical controls that have been managed by cybersecurity teams (e.g., authentication, access control, logging, monitoring, etc.) for decades can be deployed to mitigate many non-cybersecurity risks of AI. Therefore, cybersecurity teams are uniquely positioned to play an outsized role in ensuring the safe and responsible use of AI across organizations.

Unified governance for Data and AI

Governance serves as a foundational pillar for responsible AI, ensuring ethical and effective use of data and machine learning (ML) models through:

Access management: Implementing strict policies to manage who can access data and ML models, fostering transparency and preventing unauthorized use.
Privacy safeguards: Implementing measures to protect individuals' data rights, supporting compliance with privacy regulations and building trust in AI systems.
Automated lineage and audit: Establishing mechanisms to track data and model provenance, enabling traceability, accountability, and compliance with AI regulatory standards.

Databricks Unity Catalog is an industry-leading unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform. With Unity Catalog, organizations can seamlessly govern both structured and unstructured data in any format, as well as machine learning models, notebooks, dashboards and files across any cloud or platform.

Access management for data and AI

Unity Catalog helps organizations centralize and govern their AI resources, including ML models, AI tools, feature stores, notebooks, files, and tables. This unified approach enables data scientists, analysts, and engineers to securely discover, access, and collaborate on trusted data and AI assets across different platforms. With a single permissions model, data teams can manage access policies using a unified interface for all data and AI resources. This simplifies access management, reduces the risk of data breaches, and minimizes the operational overhead associated with managing multiple access tools and discovery processes. Additionally, comprehensive auditability allows organizations to have full visibility into who did what and who can access what, further enhancing security and compliance.

Furthermore, Unity Catalog offers open APIs and standard interfaces, enabling teams to access any resource managed within the catalog from any compute engine or tool of their choice. This flexibility helps mitigate vendor lock-in and promotes seamless collaboration across teams.

Fine-tune privacy

Auto-classification and fine-grained access controls: Unity Catalog enables you to classify data and AI assets using tags and automatically classify personally identifiable information (PII). This ensures that sensitive data isn't inadvertently used in ML model development or production. Attribute-based access controls (ABAC) allow data stewards to set policies on data and AI assets using various criteria like user-defined tags, workspace details, location, identity, and time. Whether it's restricting sensitive data to authorized personnel or adjusting access dynamically based on project needs, ABAC ensures security measures are applied with detailed accuracy. Additionally, row filtering and column masking features enable teams to implement appropriate fine-grained access controls on data, preserving data privacy during the creation of AI applications.

Privacy-safe collaboration with Databricks Clean Rooms: Building AI applications today necessitates collaborative efforts across organizations and teams, emphasizing a commitment to privacy and data security. Databricks Clean Rooms offers a secure environment for private collaboration on diverse data and AI tasks, spanning machine learning, SQL queries, Python, R, and more. Designed to facilitate seamless collaboration across different cloud and data platforms, Databricks Clean Rooms ensures multi-party collaboration without compromising data privacy or security and enables organizations to build scalable AI applications in a privacy-safe manner.

Automated lineage and auditing

Establishing frameworks to monitor the origins of data and models ensures traceability, accountability, and compliance with responsible AI standards. Unity Catalog provides end-to-end lineage across the AI lifecycle, enabling compliance teams to trace the lineage from ML models to features and underlying training data, down to the column level. This feature supports organizational compliance and audit readiness, streamlining the process of documenting data flow trails for audit reporting and reducing operational overhead. Additionally, Unity Catalog provides robust out-of-the-box auditing features, empowering AI teams to generate reports on AI application development, data usage, and access to ML models and underlying data.

End-to-end AI lineage with Unity Catalog

Next Steps

Visit our website to learn more about AI on Databricks Data Intelligence Platform
Learn more about Unity Catalog, unified governance for data and AI
Watch the session from Data and AI Summit 2024 detailing how Databricks Data Intelligence Platform can help you build AI responsibly
Read the Gartner Magic Quadrant for Data Science and Machine Learning Platforms
Review our detailed resources on AI and ML security and Databricks’ Approach to Responsible AI
Download the free eBook on data and AI governance with Databricks Data Intelligence Platform

What's next?

October 24, 2024/4 min read

Building a Cost-Optimized Chatbot with Semantic Caching

November 20, 2024/4 min read