As the lakehouse becomes increasingly mission-critical to data-forward organizations, so too grows the risk that unexpected events, outages, and security incidents may derail their operations in new and unforeseen ways. Databricks offers several key observability features to help customers get ahead of this new set of threats and give them visibility into their lakehouse like never before.
From a security standpoint, one of the ways that organizations have adapted to the modern world, is to rely on the principle of "Never Trust, Always Verify" by following a Zero Trust Architecture (ZTA) model. In this blog, we're going to show you how to get you started with a ZTA on your Databricks Lakehouse Platform, and share a Databricks Notebook that will automatically generate a series of SQL queries and alerts for you. If you normally use Terraform for this kind of thing we've got you covered too, simply check out the code here.
System Tables serve as a centralized operational data store, backed by Delta Lake and governed by Unity Catalog. System tables can be queried in any language, allowing them to be used as the basis for a wide range of different use cases, from BI to AI and even Generative AI. Some of the most common use cases we've started to see customers implement on top of System Tables are:
Although a number of different schemas are available, in this blog we're largely going to focus on the system.access.audit
table, and more specifically how it can be used to augment a Zero Trust Architecture on Databricks.
The system.access.audit
table serves as your system of record for all of the material events happening on your Databricks Lakehouse Platform. Some of the use cases that might be powered by this table are:
To analyze these kinds of logs in the past, customers needed to set up cloud storage, configure cloud principals and policies, and then build and schedule an ETL pipeline to process and prepare the data. Now, with the announcement of System Tables, all you need to do is opt-in and all of the data you need will be automatically made available to you. Best of all, it works exactly the same across all supported clouds.
Some of the key concepts of a Zero Trust Architecture (ZTA) are:
In this blog we'll mainly focus on Monitoring is crucial, but it's worth briefly mentioning how Unity Catalog helps data-forward organizations implement a Zero Trust Architecture across its wider feature set too:
system.information_schema
provides a centralized view of who has access to which securables, allowing administrators to to easily identify their most privileged users.information_schema
provides a current view, this can be combined with system.access.audit
logs to monitor grants/revocations/privileges over time.Effective monitoring is one of the key foundations of an effective Zero Trust Architecture. All too often, people can be lured into the trap of thinking that for effective monitoring it's sufficient to capture the logs that we might need and only query them in the event of an investigation or incident. But in order to align to the "Never Trust, Always Verify" principle, we're going to have to be more proactive than that. Thankfully with Databricks SQL it's easy to write SQL queries against the system.access.audit
table, and then schedule them to run automatically, instantly notifying you of potentially suspicious events.
Clone the repo into your Databricks workspace (see the docs for AWS and Azure) and run the create_queries_and_alerts notebook. Some examples of the 40+ queries and alerts that will be automatically generated for you are:
Query / Alert Name | Query / Alert Description |
---|---|
Repeated Failed Login Attempts | Repeated failed login attempts over a 60-minute period within the last 24 hours. |
Data Downloads from the Control Plane | High numbers of downloads of results from notebooks, Databricks SQL, Unity Catalog volumes and MLflow, as well as the exporting of notebooks in formats that may contain query results within the last 24 hours. |
IP Access List Failures | All attempts to access your account or workspace(s) from untrusted IP addresses within the last 24 hours. |
Databricks Access to Customer Workspaces | All logins to your workspace(s) via the Databricks support process within the last 24 hours. This access is tied to a support ticket while also complying with your workspace configuration that may disable such access. |
Destructive Activities | High number of delete events over a 60-minute period within the last 24 hours. |
Potential Privilege Escalation | High number of permissions changes over a 60-minute period within the last 24 hours. |
Repeated Access to Secrets | Repeated attempts to access secrets over a 60-minute period within the last 24 hours. This could be used to detect attempts to steal credentials. |
Repeated Unauthorized Data Access Requests | Repeated unauthorized attempts to access Unity Catalog securables over a 60-minute period within the last 24 hours. Repeated failed requests could indicate privilege escalation, data exfiltration attempts or an attacker trying to brute force access to your data. |
Antivirus Scan Infected Files Detected | For customers using our Enhanced Security and Compliance Add-On, detect any infected files found on the hosts within the last 24 hours. |
Suspicious Host Activity | For customers using our Enhanced Security and Compliance Add-On, detect suspicious events flagged by the behavior-based security monitoring agent within the last 24 hours. |
Once you have run the notebook, if you scroll to the bottom you'll see an HTML table with links to each of the SQL queries and alerts:
To test the alert, simply follow the link and select refresh. If it hasn't been triggered you'll see a green OK in the top left:
If it has been triggered you'll receive an email with a table including all of the events that have triggered the alert. To set the alert to run on a schedule, just select edit and then Refresh and choose how often you want the alert to run:
You can customize the alert further as needed, including by adding additional notification destinations such as email addresses, Slack channels or MS Teams (among others!) Check out our documentation for AWS and Azure for full details on all of the options.
The monitoring and detection use cases we've looked at so far are relatively simple, but because System Tables can be queried in any language, the options are essentially perimeter-less! Consider the following ideas:
system.information_schema
to tailor our monitoring to focus on our most privileged usersgeolocation_function_and_queries.sql
notebook for an example of what this might look like:system.information_schema
to provide answers to questions about our data in natural language, to both data teams and business users alikeIn the year since our last blog about audit logging, both the Databricks Lakehouse Platform and the world have changed significantly. Back then, just getting access to the data you needed required a number of steps, before you could even think about how to generate actionable insights. Now, thanks to System Tables all of the data that you need is just a button click away. A Zero Trust Architecture (or Never Trust, Always Verify as it has come to be known) is just one of many use cases that can be powered by System Tables. Check out the docs for AWS and Azure to enable System Tables for your Databricks account today!