YongSheng Huang is a resident solution architect at Databricks. His responsibility include enterprise account service and strategic planning for technical solution products in health and life sciences. Prior to Databricks, he served as a senior research scientist at Merck & Co. He was a analytics lead on advanced analytics and statistical algorithms development for drug discovery in immune disorders, oncology, and immuno-oncology. In addition, he had worked as a senior business intelligence engineer at Amazon Web Services and responsible for statistical fraud prevention, anomaly detection, and business Yong received his PhD from the University of Michigan in 2010 with focus on statistical learning theories and applications in health and diseases.
Whole genome sequencing (WGS) has enabled us to quantify human genomic variation at whole genome scale. This has profound impact on improving our understanding of human diversity, health, and diseases. One promising application of WGS is to identify disease-causal genes that can be therapeutically targeted. However, majority of disease-associated variants are located in non-coding regions or so-called genetic deserts, thus the exact function and biological consequences of these variants are unknown. In addition, with numerous variants in linkage disequilibrium (LD), genetic sequence itself is insufficient to infer the likely causal variant(s) among many variants in a region of association. Studies have shown that majority of these variants reside in gene regulatory regions and preferentially in cell type-specific enhancers, providing insights into disease relevance. Novel cutting-edge sequencing technologies to configure 3D genomic structure and to build tissue-specific gene regulatory landscapes can link regulatory elements to their targeted genes. This allows us to associate disease-associated variants and their underlying genes targets. In this talk, we demonstrate a new approach to incorporate 3D genomic structure and chromatin states of gene regulatory landscapes in a deep learning framework to predict functions of disease-associated variants and their targeted genes. This approach can significantly improve our understanding of the functional importance of those otherwise unknown genetics variants. It allows us to evaluate and prioritize high-impact variants and their targeted genes for development of new drug intervention. Session hashtag: #SFmld2