Fighting Cyber Threats in the Public Sector with Scalable Analytics and AI

Databricks provides governmental agencies with the big data tools and technology to prevent and minimize cybersecurity threats

Published: May 13, 2020

by Michael Ortega, Arun Pamulapati and Zafer Bilaloglu

In 2019, there were 7,098 data breaches exposing over 15.1 billion records. That equates to a cyber incident every hour and fifteen minutes. The Public Sector is a prime target with cyber criminals and nation states launching a constant barrage of attacks focused on disrupting government operations and obtaining a political edge. In fact, 88% of public sector organizations report that they have faced at least one cyber attack over the past two years.

Local governments and federal agencies need to be more vigilant and defensive than ever before. To prevent attacks, information security teams need to build a holistic view of the threat environment. But, this is no easy task. In today’s digital, mobile, and connected world, confidential and sensitive data is being accessed and shared across a growing list of applications and network endpoints. This creates hundreds of thousands of events and hundreds of terabytes of data every month that need to be analyzed and contextualized in near real-time. For most agencies this creates a big data problem.

Traditional Security Tools are Falling Short

To help manage this effort, many local and federal agencies have invested in traditional SIEM tools. While these threat intelligence tools are great for monitoring known threat patterns, most were built for the on-premise world. Scaling them for terabytes of data requires expensive infrastructure build out. And even cloud-based SIEM tools typically charge per GB of data ingested. This makes scaling threat detection tools for large volumes of data cost prohibitive. As a result most agencies store a few weeks of threat data at best. This can be a real problem in scenarios where a perpetrator gains access to a network, but waits months before doing anything malicious. Without a long historical record, security teams can’t analyze cyberattacks over long tong horizons or conduct deep forensic reviews.

Beyond scaling challenges, many legacy SIEM tools lack the critical infrastructure -- advanced analytics, graph processing and machine learning capabilities -- needed to detect unknown threat patterns or deliver on a broader set of security use cases like behavioral analytics. For example, a rules-based SIEM might not detect questionable employee behavior such as an employee emailing sensitive documents to their personal email address right before they quit. In these scenarios, machine learning models are needed to detect anomalous behavior patterns across a broader set of non-traditional data sets.

Augmenting Threat Detection with Big Data Technologies

To prevent threats in today’s environment, government agencies need to find a better, more cost effective way to process, correlate and analyze massive amounts of real-time and historical data. Fortunately, the Databricks Unified Data Analytics Platform along with popular open-source tools Apache Spark™ and Delta Lake offer agencies a path forward:

Holistic, Real-time Threat Analysis – Native to the cloud and built on Apache Spark™, Databricks is optimized to process large volumes of threat data in real-time. This enables government agencies to quickly query petabytes of data stretching years into the past. This is critical for forensic reviews and profiling long-term threats. Delta Lake, an open-source storage layer that brings ACID transactions to big data workloads, is natively integrated into Databricks, providing additional optimizations that significantly accelerate queries of structured and unstructured data sets. Delta Lake also enables security teams to easily and reliably combine batch and streaming data sources which is critical for detecting new threats as they happen [watch this keynote to see how one of the world’s largest tech companies uses Delta Lake to scale threat detection].

Next Gen Threat Detection – Machine learning is critical to uncovering unknown threat patterns across broad sets of data. Databricks’ collaborative notebook environment provides data scientists with built-in machine learning libraries and the tooling they need to rapidly experiment with advanced analytics. With these capabilities, data scientists have the flexibility to build predictive machine learning models that support a broad set of security use cases such as reducing false positives produced by SIEM tools, uncovering suspicious employee behavior, detecting complex malware and more.

Cost Efficient Scale – The Databricks platform is a fully managed cloud service with cost-efficient pricing designed for big data processing. Cloud clusters auto-scale up and down automatically so teams only use the compute needed for the job. Security teams no longer need to absorb the costly burden of building and maintaining a homegrown cybersecurity analytics platform or paying per GB of data ingested.

Improve Your Agency’s Security Posture with Big Data and AI

As cyber criminals continue to evolve their techniques, so do local and national government security teams need to evolve their cybersecurity strategies and how they detect and prevent threats. Big data analytics and machine learning technologies provide government agencies a path forward, but choosing the right platform is critical to success.