Cloud native deployment has become one of the major trends for large scale Big Data analytics. Compared to on-premise data center, cloud offers much stronger scalability and higher elasticity to Big Data applications. However, cloud is also considered to be less performance than on-premise alternatives due to virtualization and cluster resource disaggregation. We present a new cloud native Spark application architecture backed by persistent memory technology. The key ingredient of this architecture is a novel acceleration engine that uses Intel’s 3DXPoint technology as external memory. We discuss how the performance of multiple aspects of data processing can be improved using this new architecture. As a key takeaway, audience will gain understanding on the benefits of latest persistent memory technology, and how such new technology could be leveraged in cloud data processing architecture.
Jerry Shao works as an expert engineer at Tencent Cloud, mainly focused on Spark area, especially Spark core, Spark on Yarn and Spark Streaming. He is an Apache Spark committer and Apache Livy PMC. Prior to Tencent, he was a Member of Technical Staff at Hortonworks working on open source Big Data area.
Graduated as a BS in Fudan University, Zhuang Peiyu worked in EMC as a principle engineer for 12 years and is now an engineer for MemVerge. His work is focusing on integration between MemVerge's distributed PMEM framework and big data frameworks.