Spark 2.3.0 set a great foundation for using Apache Arrow to increase Python performance and interoperability with Pandas. Come by and share your use cases to see if using Arrow could work to improve your Spark jobs. Discuss possible next steps for leveraging Arrow in Spark, and how it would jumpstart Machine Learning and Deep Learning workloads.
Bryan Cutler is a software engineer at IBM's Spark Technology Center, where he works on big data analytics and machine learning systems. He is a contributor to Apache Spark in the areas of ML, SQL, Core and Python and a committer for the Apache Arrow project. His interests are in pushing the boundaries of software to build high performance tools that are also a snap to use.
Li Jin is a software engineer at Two Sigma. Li focuses on building high performance data analysis tools with Python and Spark for financial data. Li is a co-creator of Flint: a time series analysis library on Spark. Previously, Li worked on building large scale task scheduling system. In his spare time, Li loves hiking, traveling and winter sports.