Carson Wang is a big data software engineer at Intel, focusing on developing and improving new big data technologies. He is an active open source contributor to the Apache Spark and Alluxio projects. He is also a core developer and maintainer of HiBench – an open sourced bigdata micro benchmark suite. Prior to Intel, Carson worked for Microsoft on Windows Azure.
Spark SQL is a very effective distributed SQL engine for OLAP and widely adopted in Baidu production for many internal BI projects. However, Baidu has also been facing many challenges for large scale including tuning the shuffle parallelism for thousands of jobs, inefficient execution plan, and handling data skew. In this talk, we will explore Intel and Baidu's joint efforts to address challenges in large scale and offer an overview of an adaptive execution mode we implemented for Baidu's Big SQL platform which is based on Spark SQL. At runtime, adaptive execution can change the execution plan to use a better join strategy and handle skewed join automatically. It can also change the number of reducer to better fit the data scale. In general, adaptive execution decreases the effort involved in tuning SQL query parameters and improves the execution performance by choosing a better execution plan and parallelism at runtime. We will also share our experience of using adaptive execution in Baidu's production cluster with thousands of server, where adaptive execution helps to improve the performance of some complex queries by 200%. After further analysis we found that several special scenarios in Baidu data analysis can benefit from the optimization of choosing better join type. We got 2x performance improvement in the scenario where the user wanted to analysis 1000+ advertisers' cost from both web and mobile side and each side has a full information table with 10 TB parquet file per-day. Now we are writing probe jobs to detect more scenarios from current daily jobs of our users. We are also considering to expose the strategy interface based on the detailed metrics collected form adaptive execution mode for the upper users.
Catalyst is an excellent optimizer in SparkSQL, provides open interface for rule-based optimization in planning stage. However, the static (rule-based) optimization will not consider any data distribution at runtime. A technology called Adaptive Execution has been introduced since Spark 2.0 and aims to cover this part, but still pending in early stage. We enhanced the existing Adaptive Execution feature, and focus on the execution plan adjustment at runtime according to different staged intermediate outputs, like set partition numbers for joins and aggregations, avoid unnecessary data shuffling and disk IO, handle data skew cases, and even optimize the join order like CBO etc.. In our benchmark comparison experiments, this feature save huge manual efforts in tuning the parameters like the shuffled partition number, which is error-prone and misleading. In this talk, we will expose the new adaptive execution framework, task scheduling, failover retry mechanism, runtime plan switching etc. At last, we will also share our experience of benchmark 100 -300 TB scale of TPCx-BB in a hundreds of bare metal Spark cluster. Session hashtag: EUdev4