Apache Spark’s Structured Streaming library provides a powerful set of primitives for building streaming pipelines for data processing. However, it is not always obvious how to take full advantage of this power in a way that works naturally with your application’s unique business logic. If you associate algebra with solving equations while wishing you were doing something else, think again: we’ll see how we can apply the properties of operations we all understand — like addition, multiplication, and set union — to reason about our data engineering pipelines.
Attendees will learn easy techniques for exploiting algebraic patterns in their data processing logic that work seamlessly with Spark’s Structured Streaming constructs, effectively extending Spark’s native primitives with your customized data processing operations. These simple yet powerful ideas will be illustrated with real world examples.
Session hashtag: #SAISDev2
Erik Erlandson is a Software Engineer at Red Hat, where he investigates analytics use cases and scalable deployments for Apache Spark in the cloud. He also consults on internal data science and analytics projects. Erik is a contributor to Apache Spark and other open source projects in the Spark ecosystem, including the Spark on Kubernetes community project, Algebird and Scala.