With the exponential growth of genomic data sets, healthcare practitioners now have the opportunity to improve human outcomes at an unprecedented pace. These outcomes are difficult to realize in the existing ecosystem of genomic tools, where biostatisticians regularly chain together command-line interfaces based on a single-node setup on premise. The Databricks Unified Analytics Platform for Genomics empowers users to perform end-to-end analysis on our massively scalable platform in the cloud: in only minutes, a data scientist can visualize an individual’s disease risk based on their raw genomic data. Built on Apache Spark, we provide click-button implementations of accepted best practice workflows, as well as low-level Spark SQL optimizations for common genomics operations.
Karen Feng is a software engineer at Databricks, where she builds solutions for healthcare and life sciences. Before Databricks, she developed statistical algorithms for genomics at Princeton University.