The Internals of Stateful Stream Processing in Spark Structured Streaming - Databricks

The Internals of Stateful Stream Processing in Spark Structured Streaming

Download Slides

Let’s talk about state management in Spark Structured Streaming. During this talk you will learn the streaming concepts that are particularly relevant for stateful stream processing in Structured Streaming, e.g. watermark and output modes, but also GroupState and GroupStateTimeout. We will be exploring simple stateful processing (with groupBy operator) and more advanced use cases with KeyValueGroupedDataset.mapGroupsWithState and the most advanced KeyValueGroupedDataset.flatMapGroupsWithState operator. In other words, you will learn how to use the stateful streaming API and understand the internals.

 

Try Databricks
See More Spark + AI Summit Europe 2019 Videos

« back
About Jacek Laskowski

Development and training services

Jacek is an independent consultant who offers development and training services for Apache Spark (and Scala, sbt with a bit of Hadoop YARN, Apache Kafka, Apache Hive, Apache Mesos, Akka Actors/Stream/HTTP, and Docker). He leads Warsaw Scala Enthusiasts and Warsaw Spark meetups. The latest project is to get in-depth understanding of Apache Spark in https://jaceklaskowski.gitbooks.io/mastering-apache-spark/.