Airbnb primarily leverages Spark to power mission critical data applications. In this talk, we would like to share our major production use cases including both Streaming applications and Batch processing applications. In addition, we would like share our optimizations on how to improve the throughput of Spark Kafka connector by 10X. Furthermore, we plan to share our journey and lessons learned during the process of upgrading Spark 1+ applications to Spark 2+. The key takeaways includes best practices learned from building and scaling production Spark applications as well as tips and benefits of migrating to Spark 2.x. We hope to share our experiences of making Spark successful at Airbnb with a broader audience of Spark users.
Hao is a senior software engineer on Airbnb's Data Platform team. He has been leading the development and driving the adoption of data infrastructure based on Apache Spark at Airbnb. Before that, he helped building the data and machine learning infrastructure for IBM Watson. He received his PhD from the University of Southern California.
Liyin Tang is a software engineering on the Data Infrastructure team at Airbnb. Before Airbnb, he worked at Facebook and Dropbox. He focuses on building high available and reliable storage services and helping the services scale in the face of exponential data growth. Mr Tang joined HBase PMC in 2013 and also contributed to other Apache projects including HDFS and Hive. Recently, he is building a streaming infrastructure to power realtime data products at Aribnb. He holds a master's degree in computer science from University of Southern California.