Yogesh Pandit

Staff Software Engineer, Roche

Yogesh Pandit is a Staff Software Engineer in the Analytics Group within Diagnostics Information Solutions at Roche. Currently, he’s leading the NLP efforts to support the company’s NAVIFY platform, which aims to support oncology care teams to review, discuss, and align on treatment decisions for the patient. Yogesh is a bioinformatician turned machine learning enthusiast with experience in biomedical NLP. For the past few years, he’s been working on building applications leveraging data in the life sciences and healthcare space.



Automated and Explainable Deep Learning for Clinical Language Understanding at RocheSummit 2020

Unstructured free-text medical notes are the only source for many critical facts in healthcare. As a result, accurate natural language processing is a critical component of many healthcare AI applications like clinical decision support, clinical pathway recommendation, cohort selection, patient risk or abnormality detection. Recent advances in deep learning for NLP have enabled a new level of accuracy and scalability for clinical language understanding making a broad set of applications possible for the first time.

The first part of this talk will cover the deep learning techniques, explain-ability features, and NLP pipeline architecture that has been applied. We'll provide a short overview of the key underlying technologies: Spark NLP for Healthcare, BERT embeddings, and healthcare-specific embeddings. Then, we'll describe how these were applied to tackle the challenges of a healthcare setting: understanding clinical terminology, extracting specialty-specific facts of interest, and using transfer learning to minimize the required amount of task-specific annotation. The use of MLflow and its integration with Spark NLP to track experiments and reproduce results will also be covered.

The second part of the talk will cover automated deep learning: the system's ability to train, tune and measure models once clinical annotators add or correct labeled data. We will cover the annotation process and guidelines; why automation was required to handle the variety in clinical language across providers, document types, and geographies; and how this works in practice. Providing explainable results - including highlighting evidence in the text for extracted semantic facts - is another critical business requirement that we'll show how we've addressed. This talk is intended for data scientists, software engineers, architects and leaders who must design real-world clinical AI applications and are interested in lessons learned applying the latest advances in NLP and deep learning in this space.