Augment Your SIEM for Cybersecurity at Cloud Scale

Published: July 23, 2021

Over the last decade, security incident and event management tools (SIEMs) have become a standard in enterprise security operations. SIEMs have always had their detractors. But the explosion of cloud footprints is prompting the question, are SIEMs the right strategy in the cloud-scale world? Security leaders from HSBC don’t think so. In a recent talk, Empower Splunk and Other SIEMs with the Databricks Lakehouse for Cybersecurity, HSBC highlighted the limitations of legacy SIEMs and how the Databricks Lakehouse Platform is transforming cyberdefense. With $3 trillion in assets, HSBC’s talk warrants some exploration.

In this blog post, we will discuss the changing IT and cyber-attack threat landscape, the benefits of SIEMs, the merits of the Databricks Lakehouse and why SIEM + Lakehouse is becoming the new strategy for security operations teams. Of course, we will talk about my favorite SIEM! But I warn you, this isn’t a post about critiquing “legacy technologies built for an on-prem world.” This post is about how security operations teams can arm themselves to best defend their enterprises against advanced persistent threats.

The enterprise tech footprint

Some call it cloud-first and others call it cloud-smart. Either way, it is generally accepted that every organization is involved in some sort of cloud transformation or evaluation -- even in the public sector, where onboarding technology isn’t a light decision. As a result, the main US cloud service providers all rank within the top 5 largest market cap companies in the world. As tech footprints are migrating to the cloud, so are the requirements for cybersecurity teams. Detection, investigation and threat hunting practices are all challenged by the complexity of the new footprints, as well as the massive volumes of data. According to IBM, it takes 280 days on average to detect and contain a security breach. According to HSBC’s talk at Data + AI Summit, 280 days would mean over a petabyte of data -- just for network and EDR (endpoint threat detection and response) data sources.

When an organization needs this much data for detection and response, what are they to do? Many enterprises want to keep the cloud data in the cloud. But what about from one cloud to the other? I spoke to one large financial services institution this week who said, “We pay over a $1 million in egress cost to our cloud provider.” Why? Because their current SIEM tool is on one cloud service and their largest data producers are on another. Their SIEM isn’t multi-cloud. And over the years, they have built complicated transport pipelines to get data from one cloud provider to the other. Complications like this have warped their expectations from technology. For example, they consider 5-minute delays in data to be real time. I present this here as a reality of what modern enterprises are confronted with -- I am sure the group I spoke with is not the only one with this complication.

Security analytics in the cloud world

The cloud terrain is really messing with every security operations team’s m.o. What was called big data 10 years ago is puny data by today’s cloud standards. With the scale of today’s network traffic, gigabytes are now petabytes, and what used to take months to generate now happens in hours. The stacks are new and security teams are having to learn them. Mundane tasks like, “have we seen these IPs before” are turning into hours or days-long searches in SIEM and logging tools. Slightly more sophisticated contextualization tasks, like adding the user’s name to network events, are turning into near impossible ordeals. And if one wants to do streaming enrichments of external threat intelligence at terabytes of data per day -- good luck -- hope you have a small army and a deep pocket. And we haven’t even gotten to anomaly detection or threat hunting use cases. This is by no means a jab at SEIMs. In reality, the terrain has changed and it’s time to adapt. Security teams need the best tools for the job.

What capabilities do security teams need in the cloud world? First and foremost, an open platform that can be integrated with the IT and security tool chains and does not require you to provide your data to a proprietary data store. Another critical factor is a multi-cloud platform, so it can run on the clouds (plural) of your choice. Additionally, a scalable and highly-performant analytics platform, where compute and storage are decoupled that can support end-to-end streaming AND batch processing. And finally, a unified platform to empower data scientists, data engineers, SOC analysts and business analysts -- all data people. These are the capabilities of the Databricks Lakehouse Platform.

The SaaS and auto-scaling capabilities of Databricks simplify the use of these sophisticated capabilities. Databricks security customers are crunching across petabytes of data in sub ten minutes. One customer is able to collect from 15+ million endpoints and analyze the threat indicators in under an hour. A global oil and gas producer, paranoid about ransomware, runs multiple analytics and contextualizes every single powershell execution in their environment -- analysts only see high confidence alerts.

Lakehouse + SIEM : The pattern for cloud-scale security operations

George Webster, Head of Cybersecurity Sciences and Analytics at HSBC, describes the Lakehouse + SIEM is THE pattern for security operations. It leverages the strengths of the two components: a lakehouse architecture for multicloud-native storage and analytics, and SIEM for security operations workflows. For Databricks customers, there are two general patterns for this integration. But they are both underpinned by what Webster calls, The Cybersecurity Data Lake with Lakehouse.

The first pattern: The lakehouse stores all the data for the maximum retention period. A subset of the data is sent to the SIEM and stored for a fraction of the time. This pattern has the advantage that analysts can query near-term data using the SIEM while having the ability to do historical analysis and more sophisticated analytics in Databricks. And manage any licensing or storage costs for the SIEM deployment.

The second pattern is to send the highest volume data sources to Databricks, (e.g. cloud native logs, endpoint threat detection and response logs, DNS data and network events). Comparatively low volume data sources go to the SIEM, (e.g. alerts, email logs and vulnerability scan data). This pattern enables Tier 1 analysts to quickly handle high-priority alerts in the SIEM. Threat hunt teams and investigators can leverage the advanced analytical capabilities of Databricks. This pattern has a cost benefit of offloading processing, ingestion and storage off of the SIEM.

Integrating the Lakehouse with Splunk

What would a working example look like? Because of customer demand, the Databricks Cybersecurity SME team created the Databricks add-on for Splunk. The add-on allows security analysts to run Databricks queries and notebooks from Splunk and receive the results back into Splunk. A companion Databricks notebook enables Databricks to query Splunk, get Splunk results and forward events and results to Splunk from Databricks.

With these two capabilities, analysts on the Splunk search bar can interact with Databricks without leaving the Splunk UI. And Splunk search builders or dashboards can include Databricks as part of their searches. But what's most exciting is that security teams can create bi-directional, analytical automation pipelines between Splunk and Databricks. For example, if there is an alert in Splunk, Splunk can automatically search Databricks for related events, and then add the results to an alerts index or a dashboard or a subsequent search. Or conversely, a Databricks notebook code block can query Splunk and use the results as inputs to subsequent code blocks.

With this reference architecture, organizations can maintain their current processes and procedures, while modernizing their infrastructure, and become multi-cloud native to meet the cybersecurity risks of their expanding digital footprints.

Achieving scale, speed, security and collaboration

Since partnering with Databricks, HSBC has reduced costs, accelerated threat detection and response, and improved their security posture. Not only can the financial institution process all of their required data, but they've increased online query retention from just days to many months at the PB scale. The gap between an attacker's speed and HSBC's ability to detect malicious activity and conduct an investigation is closing. By performing advanced analytics at the pace and speed of adversaries, HSBC is closer to their goal of moving faster than bad actors.

As a result of data retention capabilities, the scope of HSBC threat hunts has expanded considerably. HSBC is now able to execute 2-3x more threat hunts per analyst, without the limitations of hardware. Through Databricks notebooks, hunts are reusable and self-documenting, which keeps historical data intact for future hunts. This information, as well as investigation and threat hunting life cycles, can now be shared between HSBC teams to iterate and automate threat detection. With efficiency, speed and machine learning/artificial intelligence innovation now available, HSBC is able to streamline costs, reallocate resources, and better protect their business-critical data.