The field of genomics has matured to a stage where organizations are sequencing DNA at population scale. However, taking raw DNA sequence data and transforming it into a format suitable for analysis has become the new bottleneck to genomic discovery. Typically, teams are gluing together a series of bioinformatics tools with custom scripts and processing data on single node machines, one sample at a time. Bioinformatics teams are spending more time building and maintaining pipelines than modeling data. To ease the burden of analyzing population scale genomic data, we have introduced the Databricks Unified Data Analytics Platform for Genomics. This platform simplifies the end-to-end process of turning raw sequencing data into actionable insights at scale. At the core is Glow, an open-source collaboration between the Regeneron Genetics Center® and Databricks. Glow is a bioinformatics tool built on Apache Spark™ and Delta Lake, which makes it easy to blend bioinformatics workflows with the open-source data science ecosystem.
In this on-demand workshop, we’ll walk through how the Databricks Unified Data Analytics Platform for Genomics makes it simple to deploy Spark-based bioinformatics tools in the cloud, rapidly accelerate common genomic analyses and take advantage of machine learning techniques.
Join this hands-on workshop to learn how to: