Timothy Poterba - Databricks

Timothy Poterba

Engineer and Computational Biologist, Broad Institute of MIT and Harvard

Tim Poterba is an engineer and computational biologist on the Hail team at the Broad Institute of MIT and Harvard. Prior to joining the Broad, he studied protein folding dynamics at the Max Planck Institute for Biochemistry on a Fulbright Scholarship. He received his B.A. in Biophysics from Amherst College in 2013.



Scaling Genetic Data Analysis with Apache SparkSummit 2017

In 2001, it cost ~$100M to sequence a single human genome. In 2014, due to dramatic improvements in sequencing technology far outpacing Moore’s law, we entered the era of the $1,000 genome. At the same time, the power of genetics to impact medicine has become evident. For example, drugs with supporting genetic evidence are twice as likely to succeed in clinical trials. These factors have led to an explosion in the volume of genetic data, in the face of which existing analysis tools are breaking down. As a result, the Broad Institute began the open-source Hail project (https://hail.is), a scalable platform built on Apache Spark, to enable the worldwide genetics community to build, share and apply new tools. Hail is focused on variant-level (post-read) data; querying genetic data, as well as annotations, on variants and samples; and performing rare and common variant association analyses. Hail has already been used to analyze datasets with hundreds of thousands of exomes and tens of thousands of whole genomes, enabling dozens of major research projects. Session hashtag: #SFr1