Life Sciences Data + AI Workshop

Improving the efficiency of pharma R&D with unified data analytics

The field of genomics has matured to a stage where organizations are sequencing DNA at population scale. However, taking raw DNA sequence data and transforming it into a format suitable for analysis has become the new bottleneck to genomic discovery. Typically, teams are gluing together a series of bioinformatics tools with custom scripts and processing data on single node machines, one sample at a time. Bioinformatics teams are spending more time building and maintaining pipelines than modeling data. To ease the burden of analyzing population scale genomic data, we have introduced the Databricks Unified Data Analytics Platform for Genomics. This platform simplifies the end-to-end process of turning raw sequencing data into actionable insights at scale. At the core is Glow, an open-source collaboration between the Regeneron Genetics Center® and Databricks. Glow is a bioinformatics tool built on Apache Spark™ and Delta Lake, which makes it easy to blend bioinformatics workflows with the open-source data science ecosystem.

In this on-demand workshop, we’ll walk through how the Databricks Unified Data Analytics Platform for Genomics makes it simple to deploy Spark-based bioinformatics tools in the cloud, rapidly accelerate common genomic analyses and take advantage of machine learning techniques.

Join this hands-on workshop to learn how to:

  • Call variants, both in a single sample and across multiple samples, using our accelerated GATK4 pipelines
  • Use Spark SQL and Glow to:
  • Characterize the association of variants in a population with phenotypes
  • Use whole genome regression to model genome-wide disease risk across multiple variants associated with a phenotype of interest