Mark Paul - Databricks

Mark Paul

Engineering Manager, Healthdirect Australia

Mark Paul has 15 years experience in large scale software development. Having worked in frontend, backend, data engineering and architecture roles he has gained practical knowledge on how to build distributed software solutions that scale. He currently works for HealthDirect (An Australian Government agency), solving complex data quality issues in the Public Health space.


How Australia’s National Health Services Directory Built a Federated Data Platform using Databricks Delta Lake and Spark Structured StreamingSummit 2020

In a continuation of a talk titled 'How Australia's National Health Services Directory (NHSD) Improved Data Quality, Reliability, and Integrity with Databricks Delta Lake and Structured Streaming' given by our Solution Architect at the Spark Summit 2019 - We hope to present how the NHSD implemented a 'Federated Data Platform' that ingests data from multiple sources (such as Authoritative System of Record, commercial vendors etc. ) and performs data operations like validation, matching, merging, and versioning whilst generating and maintaining comprehensive data lineage, attribution and provenance in a quest to continually improve data quality, governance and completeness. We will also cover how we currently 'rank' (promote/demote) input data sources based on manual audit outcomes and how we intend to use machine learning to achieve auto classification of preferred data sources (in the event multiple sources compete to update the same data attributes). We intend to show code snippets to demonstrate key features and functionality of our platform.