Prisma Cloud is the leading Cloud Security platform that provides comprehensive code-to-cloud visibility into your risks and incidents, offering key remediation capabilities to manage and monitor your code-to-cloud journey. The platform today secures over 1B+ assets or workloads across code to cloud globally. It secures some of the most demanding environments with customers who have tens of thousands of cloud accounts that see constant mutations and configuration changes in the scale of trillions every hour.
Throughout this blog we will review Prisma Cloud’s historical approach to building data and AI into our products, the challenges we ran into with our existing data platform, and how with Databricks Data Intelligence Platform, Prisma Cloud have achieved a transformative, enterprise-wide impact that directly benefits both our customers and internal teams.
Prisma Cloud’s focus was to offer best-of-breed solutions within each segment/module and then provide value-added security features that help tie signals from different modules to deliver deeper capabilities as a platform offering. Some examples include:
Prisma Cloud is set up with over 10 modules, each being best of breed in its security features and generating signals to the platform. Customers can choose to leverage the platform for their vertical needs (e.g. for vulnerability management) or for the whole suite. The platform approach encourages the customer to explore adjacent areas, increasing overall value and driving greater stickiness.
Prisma Cloud’s technical challenge is fundamentally a data challenge. With our rapid module expansion—driven by both organic innovation and M&As—developing a unified data strategy from scratch was a demanding task. However the vision was clear: without a solution to consolidate all data in one place, we couldn’t fully deliver the capabilities our customers need while harnessing the power of best-of-breed modules.
As one of the largest adopters of GenAI, Palo Alto Networks has built its AI strategy around three key pillars: leveraging AI to enhance security offerings, securing AI to help customers protect their AI usage, and optimizing user experience through AI-driven copilots and automation. See PrecisionAI for more details.
Palo Alto Networks and Prisma Cloud had a strong history of deep AI/ML usage across multiple products and features long before the GenAI wave reshaped the industry. However, the rapid evolution of AI capabilities accelerated the need for a long-term, comprehensive data strategy.
We chose the Databricks Data Intelligence Platform as the best fit for our strategic direction and requirements, as it encompassed all the critical aspects needed to support our vision. With Databricks, we’ve significantly accelerated our data consolidation efforts and scaled innovative use cases—delivering measurable customer benefits within just six months of rollout.
In just the first year of integrating Databricks, Palo Alto Networks achieved a transformative, enterprise-wide impact that directly benefits both our customers and internal teams. By centralizing data workflows on the Databricks Platform, we significantly reduced complexity and accelerated innovation, enabling us to iterate on AI/ML features three times faster than before. Alongside this increased speed, we realized a 20% reduction in cost of goods sold and a 3x decrease in engineering development time.
Leveraging enhanced collaboration—fueled by Databricks Workflows, Databricks Unity Catalog for unified governance, and Databricks Auto Loader has allowed us to deliver security solutions at an unprecedented speed and scale. This has dramatically accelerated Prisma Cloud’s data processing and enabled us to bring impactful features to market faster than ever before.
Prisma Cloud runs most of its infrastructure on AWS with a mature engineering tech stack built around AWS native services. Our team had extensive experience leveraging Apache Spark for ETL and analytical processing, running our infrastructure on AWS Glue and EMR.
Recognizing the need for a dedicated data platform, we initially developed a homegrown solution leveraging EMR, Glue and S3 as the foundation for our initial version. While this approach worked well with a small team, scaling it to support a broader data strategy and adoption across multiple teams quickly became a challenge. We found ourselves managing thousands of Glue jobs and multiple EMR clusters—all requiring enterprise-grade capabilities such as monitoring, alerting, reliability checks, and governance/security guardrails.
As our needs grew, so did the operational overhead. A significant portion of our engineering effort was diverted to maintaining what had effectively become an “Operating System” for our data platform rather than focusing on innovation and value-driven use cases.
While this effort addressed our strategic needs, we soon started running into several challenges in maintaining this version. Some of them are listed below
Despite these challenges, our homegrown solution continues to scale, processing tens of millions of data mutations per hour for critical use cases. As we look ahead, we see a clear need to migrate to a more mature platform—one that allows us to retire in-house tooling and refocus engineering efforts on securing our customers' cloud environments rather than managing infrastructure.
At Prisma Cloud, we follow the 8-factor rule for any technical evaluation to assess its advantages and disadvantages. These factors are analyzed by our internal technical leadership committee, where we engage in discussions to reach a consensus. In cases where a factor cannot be adequately rated, we gather additional data through business-relevant prototyping to ensure a well-informed decision.
The key factors are listed below:
One of our key long-term goals was the ability to move towards a security data mesh model. Given our platform approach, we categorize data into 3 fundamental types:
Unlike traditional data lakes, where Bronze data is often discarded, our platform’s breadth and depth necessitate a more evolutionary approach. Rather than simply transforming data into Gold datasets, we envision our data lake evolving into a data mesh, allowing for greater flexibility, accessibility, and cross-domain insights. The diagram below reflects the long-term capability that we seek to extract from our data lake investments.
All of our assessments were centered around the above philosophy.
Apart from checking all the boxes in our new technology evaluation framework, the following key insights further cemented Databricks as our preferred data platform.
Criteria | EMR/GLUE (or Cloud Provide native tech) | Databricks |
---|---|---|
Ease of Deployment | Each team needs to work on their deployment code. Generally a sprint of work. | One-time integration and teams will adopt. SRE work was reduced to a few days. |
Ease of Admin | Maintaining versions and security patches. SREs generally take a few days. | SRE work is no longer needed. |
Integrations | SRE needs to setup Airflow and ksql (generally a sprint of work for new teams) | Out of the Box |
MLflow | Need to buy a tool or adopt open source. Each team needs to integrate. (A few months first time, a sprint of work for each team). | Out of the Box |
Data Catalog(Requires Data lineage, security, role-based access control, searchable and tagging the data.) | Need to buy tools and integrate with Prisma. | Out of the Box |
Leverage ML Libraries and Auto ML | Need to buy and integrate with Prisma. | Out of the Box |
SPOG for Developers and SRE | Not available with EMR/GLUE. | Out of the Box |
DB sql(SQL on s3 data) | Athena, Presto. SRE help is needed to integrate with Prisma. | Out of the Box |
Given our early pilots, we were convinced to start planning a migration path from our existing S3-based data lake onto the Databricks Platform. A perfect opportunity arose with a key insights project that required access to data from both Raw and Correlated layers to uncover net new security insights and optimize security problem resolution.
Before adopting Databricks, executing this type of project involved several complex and time-consuming steps:
We tested the impact of the Databricks Data Intelligence Platform on this critical project through the following steps:
This consolidation proved transformative. Within a single week of prototyping, we uncovered valuable insights by combining raw, processed, and correlated data sets, enabling a more productive evaluation of product-market fit. As a result, we gained clear direction on which customer challenges to pursue and a stronger understanding of the impact we could deliver.
Within just six months of partnering with Databricks, we introduced a pivotal security innovation for our customers—an achievement that would have been virtually impossible given our former technology stack, expansive customer base, and the need to prioritize core security features.
As the above application case study showed, the timing of our growth aligned with Databricks emerging as the leading data platform of choice. Our shared commitment to rapid innovation and scalability made this partnership a natural fit.
By reframing the technical challenge of cloud security as a data problem, we were able to seek out technology providers who were experts in this area. This strategic shift allowed us to focus on depth, leveraging Databricks’ powerful platform while applying our domain intelligence to tailor it for our scale and business needs. Ultimately, this collaboration has empowered us to accelerate innovation, enhance security insights, and deliver greater value to our customers.
Read more about the Databricks and Palo Alto Networks collaboration here.