Grace Huang is currently an engineering director in Paypal Global Data Governance and Regulation Technology, responsible for analytical data solutions. By leveraging the advanced big data and AI technology, her teams are enabling international e-payment services thru various data-driven business domains, including risk management and enterprise compliance. Grace has 10+ years’ engineering experiences in big data industry after obtaining the master degree from Shanghai Jiao Tong University, where she focused on pattern recognition and image processing.
Nowadays, Spark is widely adopted in the big enterprise by handling the large volume of data. In PayPal, more and more complex data processing applications are running on top of Spark for its better performance and easy usage. Graphic analytics are among the emerging trend for different business use cases, E.g., risk control, compliance, etc. In this talk, we would like to share our practice while building the large scale graph applications on top of Spark. How to achieve 4-5x performance improvements while handling billions of nodes/edges? How to balance the performance and resources efficiently? What is the key learning while conducting the enterprise production-level pipelines by using Spark?