Introducing GlowGR: An industrial-scale, ultra-fast and sensitive method for genetic association studies
Today, we announce that we are making a new whole genome regression method available to the open source bioinformatics community as part of Project Glow. Large cohorts of individuals with paired clinical and genome sequence data enable unprecedented insight into human disease biology. Population studies such as the UK Biobank, Genomics England, or Genome Asia...
Introducing Glow: An Open-Source Toolkit for Large-Scale Genomic Analysis
The key to solving some of today’s most challenging medical problems lies in the analysis of genomics data. Understanding the impact of the minor changes in an individual’s genome on their overall health is fundamentally a data driven challenge that requires integration across hundreds of thousands of individuals. By analyzing genomes across large cohorts, researchers...
Parallelizing SAIGE Across Hundreds of Cores
As population genetics datasets grow exponentially, it is becoming impractical to work with genetic data without leveraging Apache Spark™. There are many ways to use Spark to derive novel insights into the role of genetic variation on disease processes. For example, Regeneron works directly on Spark SQL DataFrames, and the open-source Hail package can be...
Accurately Building Genomic Cohorts at Scale with Delta Lake and Spark SQL
This is the second post in our “Genomic Analysis at Scale” series. In our first post, we explored a simple problem: how to provide real-time aggregates when sequencing large volumes of genomes. We solved this problem by using Delta Lake and a streaming pipeline built using Spark SQL. In this blog, we focus on the more advanced...
A Summer of Personal and Professional Growth at Databricks
This summer, I worked at Databricks as a software engineering intern on the Growth team. By introducing two new features, user groups and API tokens, I simplified the user management experience and improved security for API authentication. In this blog, I briefly discuss their use and merits and share my personal experience as an intern...