Register Now!

Tuesday 9 February 2021
1.00pm GMT | 2.00pm CET

Drug development is complex. Bringing a new therapeutic to market can take more than 5 years and over $1B to develop. This drives the cost of new medicines up, while patients suffer from unmet needs. Researchers today are generating huge amounts of data through gene expression studies, high throughput screenings, simulations and imaging platforms. Additionally, researchers are starting to make broader use of real world data, whether genomics data from biobanks, images and EHR data from hospitals, or results published in journals. While so much data is accessible, many researchers struggle to use it to streamline their experiments.

In this virtual workshop, we’ll walk through how biomedical researchers are using the Databricks Unified Data Analytics Platform to efficiently curate, query and learn from vast quantities of data in the cloud. We will look at end-to-end drug discovery, and demonstrate the opportunity to efficiently identify high quality targets and develop new, well characterised leads with a unified approach to data and AI. We will then show you how to curate your data into a central research data lake. The session will close with these follow-along demos:

  • Validate target functionality by applying machine learning to gene expression data
  • Use deep learning to predict activity profiles for new small molecules from previous structure/assay relationships


1.00-1.15pm Improving the efficiency of pharma R&D with Unified Data Analytics
1.15-1.30pm Building a Research Data Lake with Databricks, Delta Lake and Apache Spark™
1.30-1.40pm Break
1.40-2.25pm Data Engineering Demo:

  • ETL Gene Expression into a Research Data Lake
  • Use ML to Validate Gene Expression Patterns

2.25-2.40pm Q&A

Space is limited for this event. Sign up today to reserve your place.