NEC has recently released new vector system “SX-Aurora TSUBASA”. This system is usually used for HPC, but is also designed for data analytics by building the vector processor as a PCIe-attached accelerator. In comparison with GPGPU, it suits for memory intensive workloads, often see at statistical machine learning and data frame processing. To accelerate data analytics on Spark, we have created acceleration framework “Frovedis” for SX-Aurora TSUBASA. It supports several machine learning algorithms on MLlib and Data Frame processing that are fully optimized for the vector processor. It is also optimized for distributed systems with multiple vector processors, and has API that is mostly the same with Spark MLlib and Data Frame. These features enables Spark developers to use multiple vector processors seamlessly from Spark and get a huge performance improvement. The performance evaluation shows that the “Frovedis” on the vector processor shows 10x to 50x speedup on several machine learning and data frame kernels compared with a Spark on Xeon Gold.
Takeo Hosomi is a senior architect of NEC Data Science Research Laboratories. He has a broad experience on High Performance Computing and Big Data.