Building Data Quality Audit Framework using Delta Lake at Cerner

Cerner needs to know what assets it owns, where they are located, and the status of those assets. A configuration management system is an inventory of IT assets and IT things like servers, network devices, storage arrays, and software licenses. There was a need to bring all the data sources into one place so that Cerner has a single source of truth for configuration. This gave birth to a data platform called Beacon. Bad data quality has a significant business costs in time, effort and accuracy. Poor-quality data is often pegged as the source of operational snafus, inaccurate analytics and ill-conceived business strategies. In our case since configuration data is largely used in making decisions about security, incident management, cost analysis etc it caused downstream impact due to gaps in data. To handle data quality issues, Databricks and Delta Lake was introduced at the helm of the data pipeline architecture. In this talk we’ll describe the journey behind building an end to end pipeline conformed to CI/CD standards of the industry from data ingestion, processing, reporting to machine learning and how Delta Lake plays a vital role in not only catching data issues but make it scalable and re-usable for other teams. We’ll talk about the challenges faced in between and lessons learned from it.


 
Register Now
« back
About Madhav Agni

Cerner

Madhav Agni is the Lead Software Engineer on Cerner's configuration management team working with technologies such as Data Lake Store, Databricks,Kafka, Power BI. Madhav's primary area of focus is technical and data architecture and leading teams in US and India in building the data platform. He also serves as a key member on Architecture governance board inside the organization. Madhav previously worked as an ETL lead at USAA specializing in data warehousing using technologies such as Datastage and Peoplesoft.