Anshul Bajpai

Technical Lead (Data Engineering), Healthdirect Australia

Anshul Bajpai is an enthusiastic Data Engineering geek who is currently working as Data Architect / Technical Lead at Healthdirect Australia (a Public Health Sector Company). He has 12+ years of overall IT experience in a variety of large enterprise systems ranging from Oil&Gas, Pharmaceuticals, Ecommerce, Travel etc. which includes 5+ years of extensive experience with designing, prototyping, building and deploying scalable data processing pipelines on distributed platform using Scala, Spark, Databricks Delta Lake, Kafka, Hadoop ecosystem etc. He is very passionate about solving complex problems in compute intensive big data systems involving volume, variety and velocity.

Past sessions

Healthcare directories underpin most healthcare systems around the world and is often a core component that enables initiatives like 'Care Coordination'. For example, if your Doctor needs to refer you to a Specialist, they use a healthcare directory to find the Specialist or if your Hospital needs to send out a discharge summary to your Doctor, they use a secure messaging lookup that's powered by a healthcare directory. Due to these kinds of critical use-cases, healthcare directories often become a "single point of failure" to healthcare systems. This is especially true in the event you have bad data quality within the directory.

In our session, we will present how the NHSD** implemented a 'Federated Data Directory Platform' that ingests data from multiple sources (Authoritative Systems of Record) and performs data operations like validation, matching, merging, enrichment, and versioning whilst generating and maintaining comprehensive data lineage, attribution and provenance in a quest to continually improve data quality, governance and completeness of Australia’s national directory of health services and practitioners. We will also cover how we currently ‘rank’ (promote/demote) input data sources based on manual audit outcomes and how we intend to use machine learning to achieve auto classification of preferred data sources. We will also detail our solution architecture built on Databricks Delta Lake and Spark Structured Streaming.

** Launched in 2012, the National Health Services Directory (NHSD) is a national directory of health services and the practitioners who provide them. This key piece of national digital health infrastructure was established by an Australian Health Ministers’ Advisory Council (AHMAC) agreement. It is jointly funded by Departments of Health within state and federal governments and managed by Healthdirect Australia.