by Omar Khawaja, Arun Pamulapati and Kelly Albano
Machine Learning (ML) and generative AI (GenAI) are revolutionizing the future of work. Organizations understand that AI is helping build innovation, maintain competitiveness, and improve the productivity of their employees. Equally, organizations understand that their data provides a competitive advantage for their AI applications. Leveraging these technologies presents opportunities but also potential risks to organizations, as embracing them without proper safeguards can result in significant intellectual property and reputational loss.
In our conversations with customers, they frequently cite risks such as data loss, data poisoning, model theft, and compliance and regulation challenges. Chief Information Security Officers (CISOs) are under pressure to adapt to business needs while mitigating these risks swiftly. However, if CISOs say no to the business, they are perceived as not being team players and putting the enterprise first. On the other hand, they are perceived as careless if they say yes to doing something risky. Not only do CISOs need to keep up with the business' appetite for growth, diversification, and experimentation, but they have to keep up with the explosion of technologies promising to revolutionize their business.
In this blog, we will discuss the security risks CISOs need to know as their organization evaluates, deploys, and adopts enterprise AI applications.
At Databricks, we believe data and AI are your most precious non-human assets, and that the winners in every industry will be data and AI companies. That's why security is embedded in the Databricks Data Intelligence Platform. The Databricks Platform allows your entire organization to use data and AI. It's built on a lakehouse to provide an open, unified foundation for all data and governance, and is powered by a Data Intelligence Engine that understands the uniqueness of your data.
Our Databricks Security team works with thousands of customers to securely deploy AI and machine learning on Databricks with the appropriate security features that meet their architecture requirements. We work with dozens of experts internally at Databricks and in the larger ML and GenAI community to identify security risks to AI systems and define the controls necessary to mitigate those risks. We have reviewed numerous AI and ML risk frameworks, standards, recommendations, and guidance. As a result, we have robust AI security guidelines to help CISOs, security leaders, and data teams understand how to deploy their organizations' ML and AI applications securely. However, before discussing the risks to ML and AI applications, let's walk through the constituent components of an AI system used to manage the data, build models, and serve applications.
AI systems are comprised of data, code, and models. A typical system for such a solution has 12 foundational architecture components, broadly categorized into four major stages:
MLOps is a set of processes and automated steps to manage the AI system's code, data, and models. MLOps should be combined with security operations (SecOps) practices to secure the entire ML lifecycle. This includes protecting data used for training and testing models and deploying models and the infrastructure they run on from malicious attacks.
In our analysis of AI and ML systems, we identified 55 technical security risks across the 12 components. In the table below, we outline these basic components that align with steps in any AI and ML system and highlight the types of security risks our team identified:
System stage | System components (Figure 1) | Potential security risks |
---|---|---|
Data operations |
|
19 specific risks, such as
|
Model operations |
|
14 specific risks, such as
|
Model deployment and serving |
|
15 specific risks, such as
|
Operations and Platform |
|
7 specific risks, such as
|
Databricks has mitigation controls for all of the above-outlined risks. In our Databricks AI Security Framework, we will walk through the complete list of risks as well as our guidance on the associated controls and out-of-the-box capabilities like Databricks Delta Lake, Databricks Managed MLflow, Unity Catalog, and Model Serving that you can use as mitigation controls to the above risks.
In addition to the technical risks, our discussions with CISOs have highlighted the necessity of addressing four organizational risk areas to ease the path to AI and ML adoption. These are key to aligning the security function with the needs, pace, outcomes and risk appetite of the business they serve.
CISOs are instinctive risk assessors. However, this superpower fails most CISOs when it comes to AI. The primary reason is that CISOs don't have a simple mental model of an AI system that they can readily visualize to synthesize assets, threats, impacts, and controls.
To help you with this, the Databricks Security team has designed an AI security workshop for CISOs and security and data leaders. These workshops are built around our Databricks AI Security Framework, offering an interactive experience that facilitates structured discussions on security threats and mitigations. They are designed to be accessible, requiring no deep expertise in machine learning concepts.
As a sneak peek, here's the top-line approach we recommend for managing the technical security risks of ML and AI applications at scale:
If you're interested in participating in one of our upcoming AI Security workshops - or hosting one for your organization - contact [email protected].
If you are curious about how Databricks approaches security, please visit our Security and Trust Center.