Fuwang Hu is currently a MTS-1 data engineer in Paypal Global Data Governance and Regulation Technology, focusing on developing data applications to fulfill the requirements of various business scenarios, including risk management and enterprise compliance. Fuwang has 5+ years’ experience on building data applications by leveraging various big data technologies, eg. spark, hadoop, hbase, etc, after obtaining the master degree from TongJi University.
Nowadays, Spark is widely adopted in the big enterprise by handling the large volume of data. In PayPal, more and more complex data processing applications are running on top of Spark for its better performance and easy usage. Graphic analytics are among the emerging trend for different business use cases, E.g., risk control, compliance, etc. In this talk, we would like to share our practice while building the large scale graph applications on top of Spark. How to achieve 4-5x performance improvements while handling billions of nodes/edges? How to balance the performance and resources efficiently? What is the key learning while conducting the enterprise production-level pipelines by using Spark?