Liang-Chi Hsieh

Software Engineer, Apple

Liang-Chi Hsieh is an Apache Spark Committer and an open source and big data engineer at Apple. Most of his contributions to Apache Spark are in SQL, MLlib modules. He recently works on Structured Streaming. Prior to joining Apple, Liang-Chi worked on internal Spark platform at Uber. He holds a Ph.D. degree in Computer Science from National Taiwan University.

Past sessions

Summit 2021 Structured Streaming Use-Cases at Apple

May 27, 2021 11:00 AM PT

Structured streaming plays an important role in current data infrastructure. In response to tremendous streaming requirements, we have actively worked on developing structured streaming in Spark in the past few months. In this talk, Kristine Guo and Liang-Chi Hsieh will detail some of the issues that arose when applying structured streaming and what was done to address them.  Specifically, they will cover:  

  • How streaming applications that need to maintain large amounts of state require a scalable state store provider as an alternative to the in-memory solution built in with Spark. 
  • Structured streaming is currently missing session window support and although a map/flatMapWithState API may be used to implement a custom window, this approach does not generalize well across applications and is hard to maintain. 
  • Why we focused on structured streaming efforts like RocksDB state store and session windowing.  

Finally, they will detail how these features can help to compute aggregates over dynamic batches with minimum size requirements and perform stream-stream joins, while supporting high RPS and throughput.

In this session watch:
Kristine Guo, Software Engineer, Apple
Liang-Chi Hsieh, Software Engineer, Apple

[daisna21-sessions-od]