Structured Streaming Use-Cases at Apple

Structured streaming plays an important role in current data infrastructure. In response to tremendous streaming requirements, we have actively worked on developing structured streaming in Spark in the past few months. In this talk, Kristine Guo and Liang-Chi Hsieh will detail some of the issues that arose when applying structured streaming and what was done to address them.  Specifically, they will cover:  

  • How streaming applications that need to maintain large amounts of state require a scalable state store provider as an alternative to the in-memory solution built in with Spark. 
  • Structured streaming is currently missing session window support and although a map/flatMapWithState API may be used to implement a custom window, this approach does not generalize well across applications and is hard to maintain. 
  • Why we focused on structured streaming efforts like RocksDB state store and session windowing.  

Finally, they will detail how these features can help to compute aggregates over dynamic batches with minimum size requirements and perform stream-stream joins, while supporting high RPS and throughput.

About Kristine Guo

Kristine is a software engineer at Apple focused on cloud platform technologies. She currently works on developing high scale backend systems. Prior to joining Apple, Kristine obtained her Bachelor's and Master's degrees in Computer Science from Stanford University.

About Liang-Chi Hsieh

Liang-Chi Hsieh is an Apache Spark Committer and an open source and big data engineer at Apple. Most of his contributions to Apache Spark are in SQL, MLlib modules. He recently works on Structured Streaming. Prior to joining Apple, Liang-Chi worked on internal Spark platform at Uber. He holds a Ph.D. degree in Computer Science from National Taiwan University.