Algorithms and Tools for Genomic Analysis on Spark - Databricks

Algorithms and Tools for Genomic Analysis on Spark

Download Slides

Hammer Lab has built several tools for analyzing genomic data on Spark, as well as libraries for more general computations using RDDs; I’ll discuss some of the most interesting applications and algorithms therein:
Guacamole (https://github.com/hammerlab/guacamole) is a somatic variant caller built on Spark; it identifies mutations in cancer genomes in a fraction of the time that comparable tools take.

Pageant (https://github.com/hammerlab/pageant) contains miscellaneous other genomic analyses and a few interesting and novel algorithms for massively-parallel Burrows-Wheeler-Transform and FM-Index construction.

Magic RDDs (https://github.com/hammerlab/magic-rdds) contains some yet more interesting general-purpose algorithms implemented on RDDs.