Unstructured free-text medical notes are the only source for many critical facts in healthcare. As a result, accurate natural language processing is a critical component of many healthcare AI applications like clinical decision support, clinical pathway recommendation, cohort selection, patient risk or abnormality detection. Recent advances in deep learning for NLP have enabled a new level of accuracy and scalability for clinical language understanding making a broad set of applications possible for the first time.
The first part of this talk will cover the deep learning techniques, explain-ability features, and NLP pipeline architecture that has been applied. We’ll provide a short overview of the key underlying technologies: Spark NLP for Healthcare, BERT embeddings, and healthcare-specific embeddings. Then, we’ll describe how these were applied to tackle the challenges of a healthcare setting: understanding clinical terminology, extracting specialty-specific facts of interest, and using transfer learning to minimize the required amount of task-specific annotation. The use of MLflow and its integration with Spark NLP to track experiments and reproduce results will also be covered.
The second part of the talk will cover automated deep learning: the system’s ability to train, tune and measure models once clinical annotators add or correct labeled data. We will cover the annotation process and guidelines; why automation was required to handle the variety in clinical language across providers, document types, and geographies; and how this works in practice. Providing explainable results – including highlighting evidence in the text for extracted semantic facts – is another critical business requirement that we’ll show how we’ve addressed. This talk is intended for data scientists, software engineers, architects and leaders who must design real-world clinical AI applications and are interested in lessons learned applying the latest advances in NLP and deep learning in this space.
David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and running web-scale data science and business platforms and teams - in startups, for Microsoft's Bing Shopping in the US and Europe, and to scale Amazon's financial systems in Seattle and the UK. David holds a PhD in computer science and master's degrees in both computer science and business administration.
Vishakha Sharma is a principal data scientist for diagnostic information solutions at Roche, where she leads advanced analytics initiatives such as natural language processing (NLP) and machine learning (ML) to discover key insights improving NAVIFY product portfolio, leading to better and more efficient patient care. Vishakha has authored 40+ peer-reviewed publications and proceedings and has given 15+ invited talks. She serves on the program committee of the ACM-W, NeurIPS, AMIA, and ACM-BCB. Her research work has been funded by the NIH Big Data to Knowledge (BD2K) initiative to build an NLP precision medicine software. She holds a PhD in computer science.
Yogesh Pandit is a Staff Software Engineer in the Analytics Group within Diagnostics Information Solutions at Roche. Currently, he’s leading the NLP efforts to support the company’s NAVIFY platform, which aims to support oncology care teams to review, discuss, and align on treatment decisions for the patient. Yogesh is a bioinformatician turned machine learning enthusiast with experience in biomedical NLP. For the past few years, he’s been working on building applications leveraging data in the life sciences and healthcare space.