My Life’s work is in modeling and translating computational systems to practical applications in genetics. I have built genome annotation engines that are used to better understand micro-organisms. This has applications in bio-energy, bio-terroism, and in monitoring and preventing clinical outbreaks. I have also applied these skills in building novel annotation engines for analyzing human variation. In a clinical setting, we wish to know if a variation could cause disease versus occurring naturally in the population; and we wish to filter millions of variants to find those most likely to cause disease.
Currently healthcare systems are engineered to handle the current case (N=1). They display the data for a given patient, they bring up the records for a given patient and they store new data back into the record. Health care providers are expected to keep in their heads the patient history and outcomes for every patient they see. Specialty health care institutions, like Mayo, rely on this knowledge in each and every doctor to provide the best possible care to patients with rare and uncommon situations. However, no matter how brilliant the mind, health care providers cannot be expected to provide high quality comparisons under the data deluge found in genomics. Just the variants alone across patients represent ~5 million differences between patients. Analytic tools are required to make large datasets understandable to researchers and clinicians. These analytics tools have unprecedented performance requirements. In the future, individualized medicine will require the physicians to use data from previous similar cases to optimize the care of the current individual (N+1). There are two important capabilities that must be accomplished to provide value 1) building real-time data driven tools at the point of care that improve outcomes and 2) providing advanced analytics to individual researchers and consortiums to enable more data driven exploration and discover to improve care. This presentation will discuss 9 important data management problems once we can identify variants (using tools such as Adam on Spark). Our group is using Spark to solve each of these problems. We are seeking engagement from the open source community to help us solve these problems and provide an open platform for personalized genomics.