Incremental Change Data Capture: A Data-Informed Journey
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Data Engineering and Streaming |
INDUSTRY | Enterprise Technology |
TECHNOLOGIES | Apache Spark, Delta Lake, Developer Experience |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
In this session, I will show you how I iterated on incremental ingestion from SaaS applications, relational databases, and event streams into a centralized data lake. This is a journey of decisions grounded in evidence rather than buzzwords and adjustments based on specific use cases instead of de facto standards. You will walk away with a data-informed mentality to design architecture that promotes long-term stewardship and developer happiness. I begin with sourcing from Salesforce and explain how Overwatch's insights helped load-balance connectors and achieved 3/4 of cost savings. I then present three flavors of CDC, from the most naive to feature-rich, from batch polling to log streaming. Query-based CDC and Lakehouse Federation reduced maintenance overload and eliminated 70% of bugs. Liquid Clustering addressed data skew across customers and dramatically increased write performance. With the latest Delta Lake, you can streamline maintenance and improve reliability.
SESSION SPEAKERS
Christina Taylor
/Data Engineering Lead
Abridge AI