Nirav Kumar is the Director of Data Science at Halodoc. With 8+ years of experience across both Data Science and Data Engineering, responsible for development of new insights, advanced modeling techniques and prediction capabilities for various business use cases. Passionate about NLP, Computer Vision and forecasting models.
May 28, 2021 11:40 AM PT
Background: Classifying diseases into ICD codes has mainly relied on human reading many written materials, such as discharge diagnoses, chief complaints, medical history, and operation records as the basis for classification. Coding is both laborious and time-consuming because a disease coder with professional abilities takes about 20 minutes per case on average. Therefore, an automatic code classification system can significantly reduce human effort.ICD-10(International Classification of Diseases 10th revision) is a classification of a disease, symptom, procedure, or injury. Diseases are often described inpatients’ medical records with free texts, such as terms, phrases and paraphrases, which differ significantly from those used in ICD-10 classification.
Objectives: This paper aims at constructing a machine learning model forICD-10 coding, where the model is to automatically determine the corresponding diagnosis codes solely based on free-text medical notes.Methods: This paper applies Natural Language Processing (NLP) and Recur-rent Neural Network (RNN) architecture with Self-attention mechanism and transformers to classify ICD-10 codes from natural language texts with super-vised learning. Results: Our predicting result can reach F1-score of 0.82 on ICD-10-CM code in the experiments on extensive teleconsultation data.
Conclusion: The developed model can significantly reduce human resources in coding time compared with a professional coder.