Skip to main content

At Databricks, the security of our platform and customers is our single highest priority. One important component of our security program is collaboration with the external security community through our bug bounty program. Over the last three years, we have accepted and awarded almost 260 valid bounty submissions from almost 150 security researchers. We have blogged in the past (here and here) about some examples of our collaboration with security researchers on discovering and protecting against novel threat vectors.

In this blog post, we would like to share a recent, noteworthy report that was submitted to the bounty program that resulted in material security enhancements.

Early Detection and Collaboration

Back in late June, our automated, real-time security monitoring pipelines detected unusual activities from multiple sources in our development environment, including access to a significant portion of our GitHub repositories, such as those supporting the Databricks platform services. We immediately started a security investigation to identify the root cause and to remediate it. While our investigation and response process was ongoing, we received an external bug bounty report from a security researcher related to the unusual activities that we had observed. As we collaborated with the researcher to track their access, we also independently performed our own investigation in accordance with our incident response plan.

Because our response was underway by the time we received the researcher’s report, our incident response and threat hunting teams had identified the researcher's activities as a non-privileged user on internal systems, such as GitHub and one of our Okta environments. We mitigated the impact through tactical steps, such as the rotation of credentials and expanding contextual access to more applications. Though we felt that the risk was very low, we proactively performed a manual review of all GitHub pull requests during the time the researcher was active to confirm that there were no unexpected requests or modification.

Databricks employs a Defense In Depth security strategy, meaning that we provide many layers of security defense that are independent from each other. If someone breaks through one layer, there will be other layers that protect against further access. For example, logging in to our production environment requires the use of a physical FIDO2 Yubikey for multi-factor authentication from a Databricks-owned device on our VPN using contextual access. In this case, this strategy provided isolation between our production environment and the part of the development environment that the researcher was able to gain access to. Thanks to this, the researcher was not able to access any customer data as a part of their research.

Based on our investigation we were able to determine that the researcher downloaded a public docker container image published by an employee on their personal account, which inadvertently contained credentials. The researcher went one level deeper by systematically scanning the docker filesystem to discover the credentials that they could leverage to access GitHub and used a probing tool to validate whether the discovered credentials remained active. These credentials allowed the researcher to access the GitHub repositories.

Security Enhancement Follow-up

Bug bounty reports like this help Databricks identify gaps in our existing defense strategies and directly lead to security hardening work that benefits Databricks and our customers.

As a result of this report, we conducted an extensive Red Team exercise and have taken steps to expand our Defense in Depth security strategy:

  1. We have expanded our existing security monitoring capabilities to probe multiple layers deeper into all scanned assets, and to scan more services.
  2. We have implemented several additional controls to block secrets from being shared in external services.
  3. We have expanded our Contextual Access as one of the defense layers. Contextual Access is a more recent security paradigm that differentiates the privileges received by the same credentials based on the host or network they’re being used on, and allows for deep device configuration checks. With this capability, our most sensitive assets are inaccessible from anything other than a managed Databricks device in good health on the right network without otherwise impacting the user experience.
  4. We have augmented our existing GitHub defenses by expanding network restrictions to limit source code repository access to authorized IP addresses. This complements our interactive access protections to also cover API access and automated GitHub actions.

We would like to thank the security researcher who helped us through this bug bounty journey and all of the security researchers who are working with us to make Databricks more secure every day. If you are a security researcher, we hope to see you soon at hackerone.com/databricks.