The speaker will review case studies from real-world projects that built AI systems using Natural Language Processing (NLP) in healthcare. These case studies cover projects that deployed automated patient risk prediction, automated diagnosis, clinical guidelines, and revenue cycle optimization. He will also cover why and how NLP was used, what deep learning models and libraries were used, and what was achieved. Key takeaways for attendees will include important considerations for NLP projects including how to build domain-specific healthcare models and using NLP as part of larger and scalable machine learning and deep learning pipelines in distributed environment.
– Hello everybody! Today we are going to talk about Spark NLP for healthcare, and the lessons that we learned in John Snow labs to build real world healthcare AI systems. I’m Veysel Kocaman and senior research scientist in John Snow Labs and we will spend the next forty minutes together.
So, today we are going start with introducing Spark NLP, what’s going on with Spark NLP library, and what we introduced in the last few weeks, and then we will focus more on the problem areas in the healthcare analytics, and how to solve healthcare related problems through NLP. If we still have some time, we will also cover some case studies.
So, Spark NLP is introduced in open source community in 2017. We were planning to provide a single buy solution for all your NLP needs, and sitting on the shoulders of Spark NLP, Spark itself actually. So, we were trying to build a library that has no other dependencies other than Spark itself. So, we take advantage of transfer learning and implementing the latest and greatest state of the art algorithms and papers in NLP research, and then try to implement the same into our own library. So, it’s already been used by several Fortune 500 companies, and we have an active development team which we are releasing every two weeks.
In Spark NLP, we have two main packages and Spark OCR. Right now I’m just going to focus more on two packages. On the left hand side, you see our Enterprise module which is licensed. On the right hand side, you see our public module. At the moment, we support up to two languages, and we have more than ninety pretrained models, and more than seventy pretrained pipelines, and you can just plug and play. The main difference between the enterprise and public is the modules that is dedicated in enterprise version like the healthcare model that we train and publish every week. So, we take the public or handcrafted clinical data sets, we label them, we train the new models, and then we put into our enterprise library, in a licensed environment. In the enterprise version, like the healthcare version we have clinical anticipation models to extract clinical entities, and someone link the entities back to normalized codes like (indistinct).
We also have associate status to detect the presence of clinical entity, like is someone is related to someone else, or present at the moment, and we also have the identification model to identify the sensitive patient information.
On the right hand side, you see the public version. On the public version we also have the clinical… We also have the NER models but not the clinical but the public version, and we also have several state of the art algorithms already implemented like sentiment analyze, patient information extraction, and especially spell checkers. So you will get to use all of them, with the public version for free.
Spark NLP’s already been trusted the following companies that you see on this slide, and these are the companies that we know, that we’re aware of that they are using Spark NLP for their workers.
So, Spark NLP has been supported in four different languages which are the main languages in the previous section like Python, Java, Scala, and R. When we say state of the art, we really mean that because we implemented the latest state of the art papers in NLP research and write our code in Scala, wrap our stuff Spark NLP and then share with the open source code. Right now when you switch from one library to another you need zero code change. Actually, the syntax are very similar like you can see on the right hand side like the Java, Scala, and Python scripts. They look like they almost use the same syntax and when you try to scale your pipeline to dozens of clusters you don’t need to do any other changes because the one that works on the local module also works on the custom one as well. It’s natively distributed and native Spark library actually.
According to the recent surveys, Spark NLP is still the most widely used NLP library in the industry, especially the healthcare industry.
These are the results coming from the last two year surveys. Two years in a row we were the most widely used NLP library in industry. And we cover nearly all of the NLP test that the other public libraries cover. We also put on top of those libraries, we also cover some spell checkers or clinical NER models that the other libraries do not cover at the moment.
As I said before, we are sitting on the shoulders of Spark ML which means that the models that we create in Spark NLP can be combined with Spark ML transformers or estimators in the same Spark pipeline. So, anything that comes from Spark ML can be also used as Spark NLP, and anything that built on the Spark NLP side can also use the Spark ML. So, sometimes we do the text processing in Spark NLP and then use some other machine learning modules from Spark ML, or vice versa.
First use Spark ML and use some state of the art NER models (indistinct) Spark NLP. And then put them, all the annotators and the transformers and the estimators in the same pipeline, and then save and then reuse later on. So, that’s the beauty of being very related to…
Living in the same environment as Spark ML.
In Spark, NLP we have two types of embeddings; clinical embeddings and parting embeddings. On the parting embeddings, we already covered multiple versions of Glove embeddings.
ELMO, we have ELMO embeddings, two versions of ELMO embeddings, and BERT embeddings, and we also have universal sentence encoders. For the BERT and ELMO, as you already know, those are the new kids in the town that shaking NLP worlds for the last few years, and actually just one year it happened since the BERT is released. Those are context heavy. ELMO and BERT are context heavy word embeddings so maybe we’ll get different embeddings depending on the context that the work is being produced. Apart from Glove, ELMO, and BERT, you have Universal Sentence Encoders which are able to create sentence embeddings for entire sentence on documents. Otherwise, we need to average word embeddings for the other embeddings.
In the clinical side, we have the following word embeddings Clinical Glove that we trained using PubMed and PMC database, and we have our own enriched clinical embeddings which we call ICDO Glove, and we have two types of BERT Clinical and Bio Bert. Actually, we have has six variants of clinical BERT (indistinct).
Starting last week, we just decided to move this bio and clinical BERT to public version so that it’s not licensed anymore. So, anyone can use open source and you can train your NER model with bio BERT or train your (indistinct) using clinical BERT. It’s all up to you. It’s all free right now. ` So, how we built the pipelines within Spark NLP.
So, Spark NLP is like the building blocks. So whatever you need in terms of your use case, you can just select any annotator from Spark NLP or Spark ML itself, and then put inside this pipeline, and then it will be linked together by the on annotator will be used as a key in another annotator. On the slide, you see that there’s a data frame which is a column of text. So, we use, at first, document assembler which is the entry point for every Spark NLP pipeline. So, we take a document, create a document column, and then we use sentence detector and it will create the sentences and then use tokenizer and go. Which means that, at the end, we will have the different columns for each annotator that we used in the same pipeline. Then we fit and then we transform as we do with Spark ML.
So, this is how it works in productions actually. On the slide, you see two different pretrained pipelines. Pretrained pipelines are the pipelines like the chains of modules that are already trained by ourselves. And then loaded into the brackets so that you can just download and use. So, there are some values that you can find by checking our Spark NLP models github page. For example, on the left hand side you just pull explain document DL pipeline which is using multiple annotators in Spark NLP like stem checkers, spell checkers (indistinct) NER. So, instead of using them one by one we have pretrained pipeline that we can just plug and play, and annotate your text. These pretrained pipelines are running in light pipeline concept in Spark NLP.
So, it’s much faster on single mission. On the right hand side you see explain document, clinical document version. In that version we have clinical NER and clinical assertion model in the same pipeline. So, you don’t need to build a pipeline, but we have the pipeline for you. You will just load it and feed it your clinical text, and then, at the end, you will get the three types of entities; problem, test and treatment. And then you will get essential steps for each one of them. If that clinical entity is already present, like contextually present in the sentence, or there is something like managing the absence of that entity. So, that we can figure out some clinical test instead of using look up text.
As you already know, Spark NLP is Spark itself for big data. It’s like a locomotive actually and people sometimes complain about this when they try to process their text in Spark itself, they complain about, like, it’s slow, but it’s like a locomotive racing a bicycle. So, the bike will win if the load is light. It’s quicker to accelerate and more agile, but with a heavy load the locomotive might take a while to get up to speed but it’s going to be faster at the end. So for faster inference in runtime, you also do a lot of light pipeline concept so that you can just feed your pipeline and feed into light pipeline like up to fifteen. You get the pipeline model and you put into the light pipeline models and light pipeline models, and light pipeline models will be removing the overhead of the Spark itself and it will be much faster on a single mission.
Spark NLP is developed by John Snow Labs. John Snow Labs is an award winning healthcare analytics company, and winning several awards for the last few years, and we are mainly focused in the healthcare side, but we also do a lot of public modules which are development teams are releasing every two weeks.
And the nature of data in healthcare is a lot different than what we can experience in the other domains. So, people usually deal with the clean and structured data but in reality we usually deal with raw, unstructured data. On top of that, when we talk about the healthcare data, it’s way worse. According to recent research, less than 50% of the structured data and less that 1% of the unstructured data are being leveraged for decision makers. So, in healthcare it’s even worse which means that we are not able… The healthcare industry is not able to leverage the data that are sitting on right now. And NLP is ultra domain specific field which means that every domain, every industry, if they want to get the most of an NLP, like the latest NLP research, they need to train their own model by using their own data.
So, in healthcare domain the origination and the exchange of data is a lot complex than other industries because we have some data sources that are sensitive and hard to collect information, hard to document, and the language itself is very unique actually. So, it’s hard to understand that language even by native English speakers. It’s hard to understand which means that the data being exchanged across multiple resources in a single hospital, for example, assuming this is a hospital cycle, data generation cycle which means that the data is everywhere right now. So, it’s our job to put them together and then get some insights out of that.
So, the language itself is the language that has been used by the specially trained physicians and the nurses, and some healthcare providers. In an NLP, we try to extract some features, from those clinical text, so that we can use the features to build something else, something bigger, in a downstream lesson. Like the communication between patient and physician would be nonsense at first in terms of any public anatomy models, but our job is to extract the relevant information from that text and then create the features, and then using those features building another model to help decision makers.
This is a very simple text that you can see in any healthcare notes. This is a sample from the MIMIC three data set which is maintained MIT for the last few years, and when you apply some other NLP sentence detector or tokenizer packages it would definitely fail because they are not trained on healthcare data, and they are not able to understand the differences in healthcare data, healthcare notes, or healthcare meta data. So, that’s our purpose to train those modules and use insight and use process medical data.
So, we have one single goal; extract as much valuable information as possible from any single text in clinical domain. It could me entity recognition. It could be sentiment score. It could be some normalized code like ICD ten or (indistinct) codes. It could be negation detection of a clinical entity, or the name of the drug or the dosage of the drug. So, our job is to be able to extract every bit of information hidden or covered in nasty clinical texts.
Right now we have four flagship in our Spark NLP healthcare model. The first one is NER, named entity recognition. We have more than two hundred clinical NER models that are trained using (indistinct) detectors which is state of the art the moment. We have word embeddings that is specifically trained on clinical data that is able to extract the relationship between the features instead of using the public embeddings. We have assertion status that I just showed that defines the negativity score of a given entity to find if it’s present or absent or related to some family member maybe.
And the last on is entity resolution, and entity resolution is normalizing your data to assign some synonyms MLS or SNOMED or ICD ten code so that standardize or normalize your clinical text or treatment procedures for reporting purposed like insurance or some other purposes.
So, how does named entity cognition model work Spark NLP. We are using (indistinct) architecture, and we can use any word embeddings in the middle. For the public you can use grow or burn for the clinical use case for the calamities in the middle. And then we create embeddings feature for each word, and then train inside our NER-DL model which is also uses the CHAR (indistinct). Right now we are in the top three and in production we are in top with production of library. We are top in production with a library that is providing these benchmarks.
This is the art of clinical named entity recognition models. As I said, we have more the two hundred different NER models. That is, you just plug and play. You don’t need to train. These are already trained on MIMIC data sets, PMC data sets, or articles. On the left hand side, you see our first NER model, which is clinical NER. You can find the name of all of them in the Spark NER model github page. In clinical NER model, we have trained this problem, treatment, and test. On the right hand side, you see Posogly NER which is responsible for extracting frequency, dosage, and duration of a medication, and anatomy is able to extract some body parts or some metabolism related entities from the text, and we have PHI to extract some sensitive information or cover (indistinct).
You can mask the sensitive information after finding NER, and then obfuscate them by replacing fake entities. So, NER is the lowest understandable meaningful chunk that is built inside Spark NLP which is that everything coming after NER is built upon NER. So, at first we run NERs and then extract the entities, and then we apply entity resolutions, assertions status detection models and vice versa.
So, when we compare ourself with Amazon AWS Medical Comprehend we are doing comparatively better, and we are able to randomly test conclusions.
We are able to extract more entities than AWS Medical Comprehend does.
Clinical assertion model makes it possible to detect the negativity score of a given entity. For example, in the first row you see that there is a mention of influenza and as a human when I read that it’s present in the condition. But if it mentioned that entity conditional, like on the second sentence you see that atypical CP per week with rest and exertion which means that it’s conditional. It’s not there. It’s just happens with rest and exertion. And on the third row you see that came back clean is a positive but it’s absent when used in a clinical context.
That test is clean actually. So, this is another deep learning model that we trained on clinical norms using a clinical (indistinct) data sets.
You see another representation of that clinical assertion model. At first we extract the entities, and every entity will assigned an assertion status label passing through assertion model. We will get a label for each entity, which means that you can use any NER before assertion model. So, you can just use Posology NER or PHI NER or Anatomy NER.
Even you can use Public NER if you want to find negative score related to that entity. You can just chain NER and assertion together as long as they use both embedding.
So, this is the clinical deidentification model. Our job is to be able to, according to HIPA rules, like a safer approach to we need to cover or hide some of the sensitive information from clinical texts so that we can share with the public, or the hospitals can share with public, or the stakeholders. Our job is to be able to extract every bit of information and then hide or mask.
If you want to preserve the consistence of a clinical test, we can just obfuscate with random doctor names, random hospital names. We have an obfuscation model in Spark NLP as well.
And entity resolvers are able to… We have three different entity resolvers at the moment and different variants of them, like around ten more than ten maybe. ICD 10, SNOMED, and (indistinct). So, they are all able to assign some normalized code like ICD or (indistinct) code to any clinical entity that is detected in the same pipeline, so that you can formalize your clinical norms, and then you don’t need to hire any medical coders to assign ICD 10 codes for you clinical notes, so that you can report that ICD 10 code for reporting purposes.
It’s similar how it works with assertion because it’s the same pipeline. We just use NER. On top of NER, we add entity resolvers annotators, and resolvers just take NERs from NER model, take the entities and assign codes according to algorithm that is implemented in entity resolvers annotators.
On the left hand side you see RxNorm Codes. On the right, you see ICD10 codes for each problem. So, if you say that I just want to assign codes for problem entities, you can just define that one or you can just say that I want to ignore the treatment entities. I just want to focus more on drug ones because RxNorms are more related to drug entities. So, you can just specify that as well. It’s very flexible.
So, we still have five minutes so let’s cover some case studies. We have four case studies but we will have time to cover all of them but the slides will be there so you can check. The first one is SelectData. Our job in that… This is a real project by the way. All these projects are real case studies that have been implemented for our customers. SelectData is a company that is responsible for converting this clinical notes that is being recorded during home health provider. So, the physicians go to the patients home, and then provide some services, and they take some notes and then report them for the insurance purposes. So, our job is to be able to extract ICD10 codes out of those texts. So, there are some already there, so many problems in that domain, like less people, less qualified workers, and less money involved to hire medical coders. So, we try to automate all this process through Spark NLP. So, we at first get the PDFs and then convert it into readable format to Spark OCR that’s specifically designed for medical notes (indistinct) or PDFs so that it can understand the typos and the structure of medical documents and convert into readable document format to Spark OCR. And then we put these processed txt files coming out of OCR into Spark NLP pipeline which is an entry point in the sample, and then we start processing our text inside like tokenizer, spell checker, NER, like other assertion status or entity resolvers model so which means that everything starts as a PDF or image and then we end up with useful entities that is already normalized.
As you see, at first we extract tokens, there are some typos as you can see like comonary and altery are typo so they need to be fixed. We have a spell checker in the same pipeline. We have a spell checker so it’s already fixed according to context. So, we don’t want to destroy the correct words itself. That’s why using spell checker is very unique processing NLP, and contextual spell checking is a recent addition to Spark NLP. (indistinct) context spell checker we call it. (indistinct) So, we converted comonary to coronary and altery to artery which are valid clinical words.
And then, we built the entities, like, we detected the entities. Since our NER models is a CHAR based as well, even if there’s a typo our NER-DL approach was able to detect that pattern and then able to assign entity label.
We just get the location of each entity so we can use for the rest of the pipeline. And then, we have the entities. It’s time to assign ICD10 codes. We are using our entity resolve models. It just takes the chunks from the NER, gets the useful chunk embeddings using the same clinical embeddings, and then them through entity resolution algorithm to assign some ICD10 codes. These are all happening inside Spark NLP without going out of there.
And the second case study is Roche. Roche company, as you already know, is a big healthcare provider and pharmaceutical company, so they have some hand created reports that they need to automate. Our job was to be able to automate this hand crafted like, replicated this hand crafted feature with named entity recognition using our NER algorithm. So, we label their data. We hired some annotator from medical domain, labeled them. We defined the annotation guideline, and then train an NER. After three weeks, we would be able to get some decent results so that replicating the human thought process while they are circling or labeling or highlighting those analysis. That’s what we do actually.
The goal was to be able to speed up the review of pathology reports and then the extraction automation, and then meeting this pipeline overtime.
So, we started with a PDF as well. If they’re already txt we just start with OCR step, and then feed it into our pipeline and then extract some NER, and then apply some entity resolution or assertion. You can use these clinical models for any other purpose. If you want to build some clinical kind of (indistinct) to detect if a patient would develop some metastasis or a patient will develop some other illnesses through time. So, you can use all these features, like extracting features through NER, assertion, entity resolver, like (indistinct) You can extract all these features from your text, and then use your downstream text to do any other construction.
At the end, we were able to analyze domain specific PDFs, like training our OCR modules and then training a custom NER models for them and then able to replicate the process inside Spark NLP.
The third one is Kaiser Permanente. Their goal was to improve patients flow forecasting. It’s huge company actually. So, they have more than forty hospital around the world. So, our job was to be able to extract some NLP features from this flow so that we can use this feature to build some patient flow forecasting model.
And whatever other features that we need to deal with, like a nurse staffing levels, bed demand levels, and patient flow. Those are all the objectives that we need to focus by extracting those features. Then, because this problem can be solved by any other second layer model building, so we need some features. We already have many features coming from recourse, but we don’t have features coming from the NLP site. So, our job was to be able to extract those features from the notes, as you can see on your screen. So, there are some nodes that are already hidden inside some garbage, and it’s already handwritten notes between patients and nurses. So, these are the notes that are not able to be leveraged in the later steps of the model. So, our job was to implement those features.
And the last one is using NLP for clinical trials. In clinical trials, we tried to shrink the rectangular are of that clinical trial so that we can help the companies justify the time spent for that clinical trials because they have just a few years to launch their products. So, our job is to be able to speed up that clinical trials, which means that they need to find the right patient with the right attribute like sicknesses or diagnosis or some specific condition. So, what we did is, instead of… Let’s just skip these parts. Instead of trying to use some lookup tables to find those patients because these patients show signs of cancer throughout time so they may develop different stages at different times. So the sicknesses might be gone, when we call them and find them for the clinical trials, or could still be there which means that we need to check entire documents across multiple years, and then try to find out that if that problem is still there or mentioning that specific patients, so that we can find the right patient for the right treatment or the right trial.
If they just used look up tables, to find those patients by checking their history, they would find all these patients represented by dots on the slide. When you check their assertion status, you will see that some of the problems are absent. Some of the diagnosis are associated with someone else. So, it would be greatly wrong and time consuming to invite those patients if they are not related to our clinical trials.
So, we applied our assertion status model and we ended up with that center line of patients and dropped out of patients which would be a huge time saver and a cost effective solution for that company.
So, we are at the end of our presentation. There are many resources that we would like to share with you. You can check our media page, our Spark NLP Workshop Repo. We also started certification trainings for Spark NLP data scientists and Spark NLP healthcare data scientists, and we share NLP 3 in the workshop repo
to be able to use the clinical versions, of course, you will need license but you can just review on the repo. It’s free on Collab. You don’t need to install anything. Everything’s in the collab. You can just run over there, and we have some recent articles, blog posts, talking view the data for our text (indistinct) with spark NLP using (indistinct) or you can do the same for NER. So, how you can label your data, how you can train your NER in Spark NLP and decode. We have many resources that you can start
and play with it. And, we have Spark NLP slack channel that is very active and we reply to questions over there. So, it was my honor to be with you right now. I hope that you like this and hope to stay in touch. You can reach me.
I’m available to social media like LinkedIn, Twitter. So, you can just reach or reach Spark NLP or John Snow Labs anywhere you want. So, I’d be more than happy to help you.
John Snow Labs
Veysel is a Senior Data Scientist at John Snow Labs, lecturer at Leiden University and a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence and big data with over ten years of experience. He is also working towards his PhD in Computer Science and is a Google Developer Expert in Machine Learning.