Efficient Near Real-Time Event Ingestion using DLT: Insights and Lessons
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Lightning Talk |
TRACK | Data Engineering and Streaming |
INDUSTRY | Enterprise Technology |
TECHNOLOGIES | Apache Spark, Delta Lake, ETL |
SKILL LEVEL | Intermediate |
DURATION | 20 min |
DOWNLOAD SESSION SLIDES |
Delve into Nextdoor's transformation journey from hourly batch event ingestion to a near-real-time streaming solution with DLT, enabling internal users such as Analysts, Data Scientists, and Engineers to query events promptly for analysis, monitoring, and real-time aggregations while reducing our compute cost with this pivotal shift. Learn about the motivation, challenges, and lessons learned during this migration. Discover insights into leveraging file notification over directory listing, effective monitoring techniques, and resolving friction between streaming and batch pipelines. Learn how custom Spark metrics aid in determining optimal data consumption points and gain a glimpse into leveraging schema evolution for evolving event schemas within DLT.
SESSION SPEAKERS
Kavin Palanisamy
/Software Engineer - Data Platform
Nextdoor