Solution Accelerator
Automated PHI Removal
Pre-built code, sample data and step-by-step instructions ready to go in a Databricks notebook
Detect and protect sensitive patient data with NLP
HIPAA requires organizations to limit access to Protected Health Information (PHI). Removing PHI from unstructured data such as images and PDFs can be challenging and manually intensive. Our joint Solution Accelerator with John Snow Labs automates the detection of sensitive information contained within unstructured data using NLP models for healthcare. Extracted data is stored within the Lakehouse, where teams can use the pre-trained models to easily remove, obfuscate or mask data for downstream analytics at massive scale.
- Convert unstructured data like PDFs to structured text with OCR models
- Easily detect PHI using pre-trained NLP models for healthcare
- Automatically remove or de-identify PHI for downstream analysis