Chendi Xue is a software engineer from Intel data analytics team. She has six years’ experience in bigdata and cloud system optimization, focusing on computation, storage, network software stack performance analysis and optimization. She participated in the development works including Spark-Sql, Spark-Shuffle optimization, cache implementation, etc. Before that, she worked on Linux Device-Mapper optimization and iscsi optimization during her master degree study.a
November 17, 2020 04:00 PM PT
Spark SQL works very well with structured row-based data. Vectorized reader and writer for parquet/orc can make I/O much faster. It also used WholeStageCodeGen to improve the performance by Java JIT code. However Java JIT is usually not working very well on utilizing latest SIMD instructions under complicated queries. Apache Arrow provides columnar in-memory layout and SIMD optimized kernels as well as a LLVM based SQL engine Gandiva. These native based libraries can accelerate Spark SQL by reduce the CPU usage for both I/O and execution.
In this session, we would like to take a deep dive on we build Native SQL engine for Spark by leveraging Arrow Gandiva and its compute kernels. We will introduce the general design of commonly used operators like aggregation, sorting and joining, and discuss how can we optimize these operators with SIMD based instructions. We will also introduce how to implement WholeStageCodeGen with Native libraries. Finally we will use micro-benchmarks and TPCH workloads to explain how vectorized execution can benefit these workloads.
Speakers: Yuan Zhou and Chendi Xue
June 24, 2020 05:00 PM PT
The increasing challenge to serve ever-growing data driven by AI and analytics workloads makes disaggregated storage and compute more attractive as it enables companies to scale their storage and compute capacity independently to match data & compute growth rate. Cloud based big data services is gaining momentum as it provides simplified management, elasticity, and pay-as-you-go model. However, Spark shuffle brings performance, scalability and reliability issues in the disaggregated architecture. Shuffle is an I/O intensive operation, which will lead to performance issues if using a typical cloud provisioned volume as shuffle media. Meanwhile, the shuffle operation of different tasks may interfere with each other thus limits Spark's scalability. Moreover, shuffle re-computation in case of compute node failure poses significant overhead for long running jobs. Disaggregating shuffle from compute node is becoming more and more important for cloud-native Spark application.
In this session, we propose a new fully disaggregated shuffle solution that leverage state-of-art hardware technologies including persistent memory and RDMA. It includes: a new pluggable shuffle manager, a persistent memory based distributed storage system, a RDMA powered network library and an innovative approach to use persistent memory as both shuffle media as well as RDMA memory region to reduce additional memory copies and context switch. This new remote shuffle solution improved Spark scalability by disaggregating Spark shuffle from compute node to a high-performance distributed storage, improved spark shuffle performance with high speed persistent memory and low latency RDMA network, and improved reliability by providing shuffle data replication and fault-tolerant optimization. Experiment performance numbers will be also presented, which demonstrates up to 10x performance speedup over traditional shuffle solution and three orders of magnitude reduction in terms of shuffle read block time.