Scale-out big data processing frameworks like Apache Spark have been designed to use on off the shelf commodity machines where each machine has the modest amount of compute , memory and storage capacity. Recent advancement in the hardware technology motivates understanding Spark performance on novel hardware architectures. Our earlier work has shown that the performance of Spark based data analytics is bounded by the frequent accesses to the DRAM. In this talk, we argue in favor of Near Data Computing Architectures that enable processing the data where it resides (e.g Smart SSDs and Compute Memories) for Apache Spark. We envision a programmable logic based hybrid near-memory and near-storage compute architecture for Apache Spark. Furthermore we discuss the challenges involved to achieve 10x performance gain for Apache Spark on NDC architectures.
Session hashtag: #EUres10
Ahsan Javed Awan is an Erasmus Mundus Joint Doctoral Fellow at KTH, Sweden and UPC, Spain. He has been working on "Architecture Support for Apache Spark based Big Data Analytics" for the last 4 years. He has previously interned at IBM Research Tokyo, Japan and Recore Systems, Netherlands. He was a visiting researcher at Barcelona Super Computing Center, Spain and also worked as a Lecturer at National University of Sciences and Technology (NUST), Pakistan. He holds an Erasmus Mundus Joint Masters Degree in Embedded Computing Systems from TU Kaiserslautern, Germany, University of Southampton, UK and NTNU, Norway and a B.E degree in Mechatronics Engineering from NUST, Pakistan.