Hao is a senior software engineer on Airbnb’s Data Platform team. He has been leading the development and driving the adoption of data infrastructure based on Apache Spark at Airbnb. Before that, he helped building the data and machine learning infrastructure for IBM Watson. He received his PhD from the University of Southern California.
Airbnb primarily leverages Spark to power mission critical data applications. In this talk, we would like to share our major production use cases including both Streaming applications and Batch processing applications. In addition, we would like share our optimizations on how to improve the throughput of Spark Kafka connector by 10X. Furthermore, we plan to share our journey and lessons learned during the process of upgrading Spark 1+ applications to Spark 2+. The key takeaways includes best practices learned from building and scaling production Spark applications as well as tips and benefits of migrating to Spark 2.x. We hope to share our experiences of making Spark successful at Airbnb with a broader audience of Spark users.