Srivatsan Krishnan - Databricks

Srivatsan Krishnan

Design Engineer, Intel

Srivatsan Krishnan is an design engineer at Intel. He is currently working on designing and exploring accelerator architectures in Intel Xeon with Integrated FPGA team. He is the technical lead for the accelerator used in ALS scoring application that is interfaced with Spark ML library via Intel Data Analytics and Acceleration library. His interest include designing novel accelerators for machine learning/AI and Big data applications. He is also interested in architecture support for runtime system /frameworks and improve system performance through optimization across the different layers in the stack.

UPCOMING SESSIONS

PAST SESSIONS

Accelerating SparkML Workloads on the Intel Xeon+FPGA PlatformSummit 2017

FPGA has recently gained attention throughout the industry because of its performance-per-power efficiency, re-programmable flexibility and wide range of applicableness. As a prediction to this phenomenon, Intel has been planning a new product line which offers a Xeon processor with integrated FPGA that will enable datacenters to easily deploy high-performance accelerators with a relatively low cost of ownership. The new Xeon+FPGA Platform is supported with a software ecosystem that eliminates the difficulties traditional FPGA devices had such as datacenter wide accelerator deployment. In this session, Intel will present their design and implementation of FPGA as a supplement to vcores in Spark YARN mode to accelerate SparkML applications on the Intel Xeon+FPGA platform. In particular, they have added new options to Spark core that provides an interface for the user to describe the accelerator dependencies of the application. The FPGA info in the Spark context will be used by the new APIs and DRF policy implemented on YARN to schedule the Spark executor to a host with Xeon+FPGA installed. Experimental results using ALS scoring applications that accelerate GEneral Matrix to Matrix Multiplication operations demonstrate that Xeon+FPGA improves the FLOPS throughput by 1.5× compared to a CPU-only cluster. Session hashtag: #SFr9