Dmitry is a Microsoft veteran, working for more than 13 years. He started as a Technical Evangelist, and in this role presented on numerous conferences, including twice being on stage with Steve Ballmer. He then worked for 2 years as Senior Software Engineer, helping big European companies to start pilot digital transformation projects based on in AI and ML. As Cloud Developer Advocate, Dmitry focuses on creating educational content and working with academic and research institutions. He is also an Associate Professor at MIPT, HSE and MAI in Moscow, a big fan of functional programming and F#, and a maintainer/primary developer of mPyPl library. In his spare time, Dmitry explores Science Art and Technological Magic, as well as performs Chinese tea ceremonies. He can be reached at http://soshnikov.com.
May 27, 2021 11:35 AM PT
In this session, we show how to leverage CORD dataset, containing more than 400000 scientific papers on COVID and related topics, and recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease.
The idea explored in our talk is to apply modern NLP methods, such and named entity recognition (NER) and relation extraction to article’s abstracts (and, possibly, full text), to extract some meaningful insights from the text, and to enable semantically rich search over the paper corpus. We first investigate how to train NER model using Medical NER dataset from Kaggle, and specialized version of BERT (PubMedBERT) as a feature extractor, to allow automatic extraction of such entities as medical condition names, medicine names and pathogens. Entity extraction alone can provide us with some interesting findings, such as how approaches to COVID treatment evolved with time, in terms of mentioned medicines. We demonstrate how to use Azure Machine Learning for training the model.
To take this investigation one step further, we also investigate the usage of pre-trained medical models, available as Text Analytics for Health service on the Microsoft Azure cloud. In addition to many entity types, it can also extract relations (such as the dosage of medicine provisioned), entity negation, and entity mapping to some well-known medical ontologies. We investigate the best way to use Azure ML at scale to score large paper collection, and to store the results.