In today’s digital age of data exploration, Apache Spark has become the de facto platform of choice for processing large volume of data from variety of sources in diverse formats, serving equally disparate destinations for Business Intelligence and Advanced Analytics. Centers for Medicare and Medicaid Services (CMS) is a federal health agency under Health and Human Services (HHS). It is the single largest payer for health care in the United States, serving nearly 90 million Americans who rely on health care benefits through Medicare, Medicaid, and the State Children’s Health Insurance Program (CHIPS). CMS recently adopted Apache Spark as its big data processing platform to ingest and analyze clinical and claims data from various data sources to produce healthcare models designed to improve patient’s health and reduce costs at the same time. The data come from multiple sources and contain Personally Identifiable Information (PII) and Protected Health Information (PHI). Thus a data governance that includes robust security controls is a must. At the same time, it must be able to serve multiple business units with several roles within each of those units requiring different levels of access to the data. This presentation will cover best data governance practices including data security, data stewardship and data quality management using both open source and commercial tools based on lessons learned from the Apache Spark implementation at CMS.
Donghwa Kim is the Director of Application Engineering at NewWave who is one of the premier technologies partners for Centers for Medicare and Medicaid Services (CMS). He is currently acting as the lead technical architect for one of the Center for Medicaid and CHIP Services programs. He has over 18 years of IT experiences working for companies such as IBM, Lehman Brothers, JP Morgan Chase and FINRA. One of his responsibility at NewWave includes overseeing the company's data science practices.