Ahsan Javed Awan is a third year Erasmus Mundus Joint Doctoral Fellow at KTH, Sweden and UPC, Spain. He is being jointly supervised by Prof. Mats Brorsson(KTH) and Prof. Eduard Ayguade (UPC/BSC). He previously worked as a Lecturer at National University of Sciences and Technology (NUST), Pakistan. He holds an Erasmus Mundus Joint Masters Degree in Embedded Computing Systems from TU Kaiserslautern, Germany, University of Southampton, UK and NTNU, Norway and a B.E degree in Mechatronics Engineering from NUST, Pakistan. He is currently a visiting researcher at Barcelona Super Computing Center.
The sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark defines the state of the art in big data analytics platforms for (i) exploiting data-flow and in-memory computing and (ii) for exhibiting superior scale-out performance on the commodity machines, little effort has been devoted at understanding the performance of in-memory data analytics with Spark on modern scale-up servers. This thesis characterizes the performance of in-memory data analytics with Spark on scale-up servers. Through empirical evaluation of representative benchmark workloads on a dual socket server, we have found that in-memory data analytics with Spark exhibit poor multi-core scalability beyond 12 cores due to thread level load imbalance and work-time inflation. We have also found that workloads are bound by the latency of frequent data accesses to DRAM. By enlarging input data size, application performance degrades significantly due to substantial increase in wait time during I/O operations and garbage collection, despite 10% better instruction retirement rate (due to lower L1 cache misses and higher core utilization). For data accesses we have found that simultaneous multi-threading is effective in hiding the data latencies. We have also observed that (i) data locality on NUMA nodes can improve the performance by 10% on average, (ii) disabling next-line L1-D prefetchers can reduce the execution time by up-to 14%. For GC impact, we match memory behaviour with the garbage collector to improve performance of applications between 1.6x to 3x. and recommend to use multiple small executors that can provide up-to 36% speedup over single large executor.
Scale-out big data processing frameworks like Apache Spark have been designed to use on off the shelf commodity machines where each machine has the modest amount of compute , memory and storage capacity. Recent advancement in the hardware technology motivates understanding Spark performance on novel hardware architectures. Our earlier work has shown that the performance of Spark based data analytics is bounded by the frequent accesses to the DRAM. In this talk, we argue in favor of Near Data Computing Architectures that enable processing the data where it resides (e.g Smart SSDs and Compute Memories) for Apache Spark. We envision a programmable logic based hybrid near-memory and near-storage compute architecture for Apache Spark. Furthermore we discuss the challenges involved to achieve 10x performance gain for Apache Spark on NDC architectures. Session hashtag: #EUres10