Algorithms and Tools for Genomic Analysis on Spark

Hammer Lab has built several tools for analyzing genomic data on Spark, as well as libraries for more general computations using RDDs; I’ll discuss some of the most interesting applications and algorithms therein:
Guacamole ( is a somatic variant caller built on Spark; it identifies mutations in cancer genomes in a fraction of the time that comparable tools take.

Pageant ( contains miscellaneous other genomic analyses and a few interesting and novel algorithms for massively-parallel Burrows-Wheeler-Transform and FM-Index construction.

Magic RDDs ( contains some yet more interesting general-purpose algorithms implemented on RDDs.