Stranger Triumphs: Automating Spark Upgrades & Migrations at Netflix
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Data Engineering and Streaming |
INDUSTRY | Enterprise Technology, Media and Entertainment |
TECHNOLOGIES | Apache Spark, ETL, Orchestration |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
With Apache Spark™ 4 in the pipeline for this year, many of us are looking at what will be involved in upgrading to the latest and greatest Spark – not to mention the ever-evolving world of AI libraries. This talk examines how Netflix has automated large parts of our upgrade and how you can use these techniques for your data platform. We will share:
- Our cool open source tools that rewrite Spark code
- Our tools for testing Spark jobs in production
- How we track the state of jobs
- Re-using those same tools to migrate to a containerized environment
- User experiences
In this session, you will learn how to: upgrade your Spark pipelines without crying and validate Spark pipelines even when you don't trust the tests (by extending the write-audit-publish pattern). This talk is ideal for Data scientists, ML engineers, and anyone who's inherited legacy data products platform engineers managing Spark infrastructure.
SESSION SPEAKERS
Holden Karau
/Engineer
Netflix / Totally Legit Co
Robert Morck
/Software engineer
Netflix