Power of Compounding on Top of a Unified Data Platform - Accelerating Innovation in Cloud Security

Published: March 25, 2025

by Ram Katakam, Krishnan Narayan, Hossein Niazmandi and Walton Stephens

Summary

Strategic Evolution: Prisma Cloud moved from a homegrown AWS solution to Databricks to handle massive scale and support advanced AI capabilities.
Measurable Impact: First-year results showed 3x faster development time, 20% cost reduction, and tripled speed in AI feature iteration.
Future-Ready Architecture: A new three-tier data structure (Raw, Processed, Correlated) laid the foundation for a flexible security data mesh across all modules.

Prisma Cloud is the leading Cloud Security platform that provides comprehensive code-to-cloud visibility into your risks and incidents, offering key remediation capabilities to manage and monitor your code-to-cloud journey. The platform today secures over 1B+ assets or workloads across code to cloud globally. It secures some of the most demanding environments with customers who have tens of thousands of cloud accounts that see constant mutations and configuration changes in the scale of trillions every hour.

Throughout this blog we will review Prisma Cloud’s historical approach to building data and AI into our products, the challenges we ran into with our existing data platform, and how with Databricks Data Intelligence Platform, Prisma Cloud have achieved a transformative, enterprise-wide impact that directly benefits both our customers and internal teams.

Prisma Cloud’s focus was to offer best-of-breed solutions within each segment/module and then provide value-added security features that help tie signals from different modules to deliver deeper capabilities as a platform offering. Some examples include:

Addressing posture issues related to infrastructure configuration and management. Fixing these issues in code and fostering an automation mindset help prevent them in production. Combining our Posture Management offering with our Code Security offering was essential to ensure traceability and resolve issues directly in code.
Visualizing and managing controls through a platform ‘knowledge graph’ helps customers understand how resources and workloads are connected. This approach enables them to assess findings and identify paths that pose greater concerns for a SOC administrator. Aggregating all signals in one place is crucial for this process.

Prisma Cloud is set up with over 10 modules, each being best of breed in its security features and generating signals to the platform. Customers can choose to leverage the platform for their vertical needs (e.g. for vulnerability management) or for the whole suite. The platform approach encourages the customer to explore adjacent areas, increasing overall value and driving greater stickiness.

Prisma Cloud’s technical challenge is fundamentally a data challenge. With our rapid module expansion—driven by both organic innovation and M&As—developing a unified data strategy from scratch was a demanding task. However the vision was clear: without a solution to consolidate all data in one place, we couldn’t fully deliver the capabilities our customers need while harnessing the power of best-of-breed modules.

As one of the largest adopters of GenAI, Palo Alto Networks has built its AI strategy around three key pillars: leveraging AI to enhance security offerings, securing AI to help customers protect their AI usage, and optimizing user experience through AI-driven copilots and automation. See PrecisionAI for more details.

Palo Alto Networks and Prisma Cloud had a strong history of deep AI/ML usage across multiple products and features long before the GenAI wave reshaped the industry. However, the rapid evolution of AI capabilities accelerated the need for a long-term, comprehensive data strategy.

Databricks ecosystem in Prisma Cloud Architecture

We chose the Databricks Data Intelligence Platform as the best fit for our strategic direction and requirements, as it encompassed all the critical aspects needed to support our vision. With Databricks, we’ve significantly accelerated our data consolidation efforts and scaled innovative use cases—delivering measurable customer benefits within just six months of rollout.

In just the first year of integrating Databricks, Palo Alto Networks achieved a transformative, enterprise-wide impact that directly benefits both our customers and internal teams. By centralizing data workflows on the Databricks Platform, we significantly reduced complexity and accelerated innovation, enabling us to iterate on AI/ML features three times faster than before. Alongside this increased speed, we realized a 20% reduction in cost of goods sold and a 3x decrease in engineering development time.

Leveraging enhanced collaboration—fueled by Databricks Workflows, Databricks Unity Catalog for unified governance, and Databricks Auto Loader has allowed us to deliver security solutions at an unprecedented speed and scale. This has dramatically accelerated Prisma Cloud’s data processing and enabled us to bring impactful features to market faster than ever before.

The challenges of homegrown solutions

Prisma Cloud runs most of its infrastructure on AWS with a mature engineering tech stack built around AWS native services. Our team had extensive experience leveraging Apache Spark for ETL and analytical processing, running our infrastructure on AWS Glue and EMR.

Recognizing the need for a dedicated data platform, we initially developed a homegrown solution leveraging EMR, Glue and S3 as the foundation for our initial version. While this approach worked well with a small team, scaling it to support a broader data strategy and adoption across multiple teams quickly became a challenge. We found ourselves managing thousands of Glue jobs and multiple EMR clusters—all requiring enterprise-grade capabilities such as monitoring, alerting, reliability checks, and governance/security guardrails.

As our needs grew, so did the operational overhead. A significant portion of our engineering effort was diverted to maintaining what had effectively become an “Operating System” for our data platform rather than focusing on innovation and value-driven use cases.

While this effort addressed our strategic needs, we soon started running into several challenges in maintaining this version. Some of them are listed below

Bespoke tooling and data transformations - Teams spent considerable time in several meetings just to identify data attributes, locate them and design custom pipelines for each use case, slowing down development and collaboration.
Time-consuming infrastructure management - With multiple tuning parameters at the core of our Spark jobs, we struggled to develop a scalable, generic change management process. This added significant cognitive load to infrastructure teams responsible for managing clusters.
Cost management and budgeting - Managing EMR and Glue directly required manually setting multiple guardrails, including centralized observability across all stacks. As our projects grew, so did the headcount requirements for maintaining a more mature data platform.
Spark Management - We also ran into challenges around some of the updates to the Spark core libraries not being supported on AWS which caused some of our jobs to be inefficient compared to what would be state-of-the-art. Internal AWS limits on executor management forced us into extensive troubleshooting and recurring meetings to determine root causes.

Despite these challenges, our homegrown solution continues to scale, processing tens of millions of data mutations per hour for critical use cases. As we look ahead, we see a clear need to migrate to a more mature platform—one that allows us to retire in-house tooling and refocus engineering efforts on securing our customers' cloud environments rather than managing infrastructure.

Data architecture and its evolution at Prisma Cloud

At Prisma Cloud, we follow the 8-factor rule for any technical evaluation to assess its advantages and disadvantages. These factors are analyzed by our internal technical leadership committee, where we engage in discussions to reach a consensus. In cases where a factor cannot be adequately rated, we gather additional data through business-relevant prototyping to ensure a well-informed decision.

The key factors are listed below:

Functional fit - Does it solve our business needs?
Architecture/Design fit - Is it aligned with our long-term technical vision?
Developer adoption - How popular is it with developers today?
Stability/Ecosystem - Are there large-scale enterprises using this technology?
Deployment complexity - How much effort are we talking about with its deployment and change management?
Cost - How do the COGs compare to the value of the features we plan to offer to leverage this technology?
Comparative benchmarks - Are there existing benchmarks that prove comparable scale?

One of our key long-term goals was the ability to move towards a security data mesh model. Given our platform approach, we categorize data into 3 fundamental types:

Raw data - This includes data ingested directly from producers or modules as it enters the platform. In Databricks lakehouse terminology - this corresponds to Bronze data.
Processed data - The Prisma Cloud Platform is an opinionated platform, transforms raw data into normalized platform objects. This is called Processed data, which aligns with the Silver data layer in lakehouse terminology.
Correlated data - This category unlocks net value by correlating different datasets, enabling advanced insights and analytics. This corresponds to the Gold layer in lakehouse terminology.

Unlike traditional data lakes, where Bronze data is often discarded, our platform’s breadth and depth necessitate a more evolutionary approach. Rather than simply transforming data into Gold datasets, we envision our data lake evolving into a data mesh, allowing for greater flexibility, accessibility, and cross-domain insights. The diagram below reflects the long-term capability that we seek to extract from our data lake investments.

All of our assessments were centered around the above philosophy.

Evaluation results

Apart from checking all the boxes in our new technology evaluation framework, the following key insights further cemented Databricks as our preferred data platform.

Simplification of existing tech stack - Our infrastructure relied on several Glue and EMR jobs, many of which required ad-hoc tooling and repetitive maintenance. With Databricks, we identified an opportunity to reduce 30%-40% of our jobs, allowing our engineers to focus on core business features instead of upkeep.
Cost reduction - We saw at least a 20% drop in existing spend, even before factoring in amortization with accelerated adoption across various use cases.
Platform features and ecosystem - Databricks provided immediate value through features such as JDBC URL exposure for data consumption, built-in ML/AI infrastructure, automated model hosting, enhanced governance and access control, and advanced data redaction and masking. These capabilities were critical as we upgraded our data handling strategies for both tactical and strategic needs.
Training and adoption ease - Onboarding new engineers onto Databricks proved significantly easier than having them build scalable ETL pipelines from scratch on AWS. This lowered the barrier to entry and accelerated the adoption of Spark-based technologies, which are essential at our scale.

Evaluation details

Criteria	EMR/GLUE (or Cloud Provide native tech)	Databricks
Ease of Deployment	Each team needs to work on their deployment code. Generally a sprint of work.	One-time integration and teams will adopt. SRE work was reduced to a few days.
Ease of Admin	Maintaining versions and security patches. SREs generally take a few days.	SRE work is no longer needed.
Integrations	SRE needs to setup Airflow and ksql (generally a sprint of work for new teams)	Out of the Box
MLflow	Need to buy a tool or adopt open source. Each team needs to integrate. (A few months first time, a sprint of work for each team).	Out of the Box
Data Catalog(Requires Data lineage, security, role-based access control, searchable and tagging the data.)	Need to buy tools and integrate with Prisma.	Out of the Box
Leverage ML Libraries and Auto ML	Need to buy and integrate with Prisma.	Out of the Box
SPOG for Developers and SRE	Not available with EMR/GLUE.	Out of the Box
DB sql(SQL on s3 data)	Athena, Presto. SRE help is needed to integrate with Prisma.	Out of the Box

Application case study

Given our early pilots, we were convinced to start planning a migration path from our existing S3-based data lake onto the Databricks Platform. A perfect opportunity arose with a key insights project that required access to data from both Raw and Correlated layers to uncover net new security insights and optimize security problem resolution.

Before adopting Databricks, executing this type of project involved several complex and time-consuming steps:

Identifying data needs - A chicken-and-egg problem emerged: while we needed to define our data needs upfront, most insights required exploration across multiple datasets before determining their value.
Integration complexity - Once data needs were defined, we had to coordinate data with owners to establish integration paths—often leading to bespoke, one-off pipelines.
Governance & access control - Once all data is available, then we had to ensure proper security and governance. This required manual configurations, with different implementations depending on where the data resides.
Observability and troubleshooting - With data pipeline monitoring split across multiple teams, resolving issues required significant cross-team coordination, making debugging highly use-case-specific.

We tested the impact of the Databricks Data Intelligence Platform on this critical project through the following steps:

Step 1: Infrastructure and Migration Planning

We bootstrapped Databricks in our dev environments and started planning the migration of our inhouse data lake on S3 onto Databricks. We utilized Databricks Asset Bundles and Terraform for both the migration and our infrastructure and resource deployment.

Prior to adopting Databricks, engineers spent most of their time managing AWS infrastructure across various tools. With Databricks, we have a centralized platform to manage user and group cluster configurations.

Databricks offers an enhanced Spark environment through Photon, providing a fully managed platform with optimized performance, whereas AWS primarily delivers Spark through its EMR service, which requires more manual configuration and does not achieve the same level of performance optimization as Databricks. Additionally, the ability to build, deploy, and serve models on Databricks has enabled us to scale more rapidly.
Step 2: Structuring Workstreams for Scale

We divided the project into four workstreams on the Databricks platform: Data Catalog Management, Data Lake Hydration, Governance and Access Control, and Dev Tooling/Automation.

Unity Catalog was essential for building our platform, providing unified governance and access controls in a single space. By utilizing attribute-based access control (ABAC) and data masking, we were able to obfuscate data as needed without slowing down development time.
Step 3: Accelerating Data Onboarding & Governance

Catalog registration and onboarding of our existing data in our data lake took only a few hours while setting up governance and access control was a one-time effort.

Unity Catalog provided a centralized platform for managing all permissions, simplifying the security of our entire data estate, including both structured and unstructured data. This encompassed governance for data, models, dashboards, notebooks, and more.
Step 4: Scaling Data Hydration & Observability

We seamlessly integrated previously unavailable raw data into our existing data lake and prioritized its hydration onto the Databricks Platform. Capitalizing on comprehensive Kafka, database, and S3 integrations, we successfully developed production-grade hydration jobs, scaling to trillions of rows within just a few sprints.

In production, we rely extensively on Databricks Workflows, while interactive clusters support development, testing, and performance environments dedicated to building innovative features for our Prisma Cloud product. Databricks Serverless SQL underpins our dashboards, ensuring efficient monitoring of model drift and performance metrics. Moreover, system tables empower us to pinpoint and analyze high-cost jobs and runs over time, track significant budget fluctuations, and foster effective cost optimization and resource management.

This holistic approach grants executives clear visibility into platform usage and consumption, streamlining observability and budgeting without relying on fragmented insights from multiple AWS tools such as EMR, Glue, SageMaker, and Neptune.

The result

This consolidation proved transformative. Within a single week of prototyping, we uncovered valuable insights by combining raw, processed, and correlated data sets, enabling a more productive evaluation of product-market fit. As a result, we gained clear direction on which customer challenges to pursue and a stronger understanding of the impact we could deliver.

Within just six months of partnering with Databricks, we introduced a pivotal security innovation for our customers—an achievement that would have been virtually impossible given our former technology stack, expansive customer base, and the need to prioritize core security features.

Databricks utilization stats

~3 Trillion records crunching per day.
P50 processing time: < 30 mins.
Max parallelism: 24
Auto Loader utilization drops ingest latencies to seconds, offering near real-time processing.
Out-of-the-box features, such as AI/BI dashboards with system tables, helped development teams identify and analyze high-cost jobs and runs over time, monitor significant budget changes, and support effective cost optimization and resource management.

Conclusion

As the above application case study showed, the timing of our growth aligned with Databricks emerging as the leading data platform of choice. Our shared commitment to rapid innovation and scalability made this partnership a natural fit.

By reframing the technical challenge of cloud security as a data problem, we were able to seek out technology providers who were experts in this area. This strategic shift allowed us to focus on depth, leveraging Databricks’ powerful platform while applying our domain intelligence to tailor it for our scale and business needs. Ultimately, this collaboration has empowered us to accelerate innovation, enhance security insights, and deliver greater value to our customers.

Read more about the Databricks and Palo Alto Networks collaboration here.

What's next?

December 9, 2024/6 min read

Scale Faster with Data + AI: Insights from the Databricks Unicorns Index

December 11, 2024/4 min read