Clinical Suspecting at Scale Using PySpark

Download Slides

One of the key components of handling patient healthcare is reducing the number of misdiagnosed/missed diseases. While the doctor may mention the condition, the overall list of diseases may not have been documented properly during risk adjustment process. We deep dive into the architectural, framework and data decisions taken while performing exploratory analysis to algorithms deployment.We extract datasets from previous medical records, past medical history, drugs, laboratory results, usage of medical equipments, procedures performed and provider specialities.

While hunting for evidence of missing disease conditions, we have to make complex decisions around the following questions:

  1. Which framework to use?
  2. How to perform feature engineering at scale?
  3. Tuning PySpark configurations
  4. Ingesting insights back into the pipeline
  5. Hyper-parameters tuning at scale
  6. Parallel processing and handling OOM errors

We had to devise a hybrid framework, both using clinical rules and ML algorithms. In the end, we identify patients with the highest possibility of having incomplete diagnosis codes.In this talk, we take a deep dive into the above questions, talking about the roadblocks (and examples) we faced while building this platform. Also discussed will be key insights that any data scientist or ML engineer may find handy while dealing with similar data or problem statements.


 
Try Databricks
« back
About Manas Ranjan Kar

Episource LLC

Manas Ranjan Kar is a Associate Vice President at US healthcare company Episource, where he leads the NLP and data science practice, works on semantic technologies and computational linguistics (NLP), builds algorithms and machine learning models, researches data science journals, and architects secure product backends in the cloud. He's architected multiple commercial NLP solutions in the area of healthcare, food and beverages, finance, and retail. Manas is deeply involved in functionally architecting large-scale business process automation and deep insights from structured and unstructured data using NLP and ML.