Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments - Databricks

Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments

Download Slides

While processing more data through an existing set of ETL or ML/AI pipelines is easy with Spark, dealing with an ever expanding and/or changing set of pipelines can be quite challenging, all the more so when there are complex inter-dependencies. Workflow-based job orchestration offers some help in the case of relatively static flows but fails miserably when it comes to supporting fast-paced data production such as data science experimentation (feature exploration, model tuning, …), ad hoc analytics and root cause analysis.

This talk will introduce three patterns for large-scale data production in fast-paced environments–just-in-time dependency resolution (JDR), configuration-addressed production (CAP) and automated lifecycle management (ALM)–with ETL & ML/AI demos as well as open-source code you can use in your projects. These patterns have been production-tested in Swoop’s petabyte-scale environment where they have significantly increased human productivity and processing flexibility while reducing costs by more than 10x.

By adopting these patterns you’ll get the benefits typically associated with rigidly-planned and highly-coordinated data production quickly & efficiently, without endless meetings or even a workflow server. You will be able to transparently ensure result accuracy even in the face of hundreds of constantly-changing inputs, eliminate duplicate computation within and across clusters and automate lifecycle management.

Session hashtag: #SAISDev1

About Sim Simeonov

Sim Simeonov is the founding CTO of Swoop, a startup that brings the power of search advertising to content. Previously, Sim was the founding CTO at Ghostery, the platform for safe & fast digital experiences, and Thing Labs, a social media startup acquired by AOL. Earlier, Sim was vice president of emerging technologies and chief architect at Macromedia (now Adobe) and chief architect at Allaire, one of the first Internet platform companies. He blogs at blog.simeonov.com, tweets as @simeons and lives in the Greater Boston area with his wife, son and an adopted dog named Tye.