Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”

Download Slides

NEC has recently released new vector system “SX-Aurora TSUBASA”. This system is usually used for HPC, but is also designed for data analytics by building the vector processor as a PCIe-attached accelerator. In comparison with GPGPU, it suits for memory intensive workloads, often see at statistical machine learning and data frame processing. To accelerate data analytics on Spark, we have created acceleration framework “Frovedis” for SX-Aurora TSUBASA. It supports several machine learning algorithms on MLlib and Data Frame processing that are fully optimized for the vector processor.

It is also optimized for distributed systems with multiple vector processors, and has API that is mostly the same with Spark MLlib and Data Frame. These features enables Spark developers to use multiple vector processors seamlessly from Spark and get a huge performance improvement. The performance evaluation shows that the “Frovedis” on the vector processor shows 10x to 50x speedup on several machine learning and data frame kernels compared with a Spark on Xeon Gold.


Try Databricks
See More Spark + AI Summit in San Francisco 2019 Videos

« back
About Takeo Hosomi

Takeo Hosomi is a senior architect of NEC Data Science Research Laboratories. He has a broad experience on High Performance Computing and Big Data.

About Takuya Araki

He received the B.E., M.E., and Ph.D. degrees from the University of Tokyo, Japan in 1994, 1996, and 1999, respectively. He was a visiting researcher at Argonne National Laboratory from 2003 to 2004. He is currently a Senior Principal Researcher of Data Science Research Laboratories, NEC Corporation. His research interests include parallel and distributed computing and its application to AI/machine learning. He is a director of the Information Processing Society of Japan (IPSJ).