Customer Case Study

Regeneron is a leading biotechnology company using the power of science to bring new medicines to patients in need.

Vertical Use Case

  • Genomic and clinical data is highly decentralized, making it very difficult to analyze and train models against the entire dataset

Technical Use Case

  • Data Ingest and ETL
  • Machine Learning

The Challenges

  • More than 95% of all the experimental medicines that are currently in the pipeline are expected to fail, despite the fact that the industry spends billions of dollars every single year on R&D
  • Genomic and clinical data is highly decentralized, making it very difficult to analyze and train models against the entire dataset
  • Difficult and costly to scale to support billions of data points that they want to be able to mine in an effective manner
  • They would spend days to just wrangle and ETL the data so that it can be used for analytics

The Solution

Databricks provides Regeneron with a unified analytics platform running on Amazon Web Services that simplifies operations and accelerates drug discovery through improved data science productivity. This is empowering them to analyze the data in new ways that were previously impossible.

  • Automated cluster management simplifies the provisioning of clusters, reducing time spent on DevOps work so engineers and data scientists can spend more time on high valued tasks
  • Interactive workspace allows data scientists to share data and insights, fostering an environment of transparency and collaboration
  • Significant ETL performance time gains reducing the time it takes to process the data from weeks to just a few hours
  • Accelerate query response times from more than 30 minutes to just a few seconds

With Databricks we have been able to reduce the time it takes to ETL massive amounts of data from weeks to just a few hours.

Lukas Habegger, Associate Director of Bioinformatics at the Regeneron Genetics Center