Shaofeng Shi - Databricks

Shaofeng Shi

Software Architect, Kyligence Inc

Shaofeng Shi is a software architect from Kylingence Inc. He is the committer and PMC member of Apache Kylin project. He developed a couple of core features in Kylin, and has abundant experience in Hadoop and Kylin enablement. Before joining Kyligence, he was a senior software engineer in eBay, CCOE and IBM China Lab.



Apache Kylin: Speed Up Cubing with Apache SparkSummit 2017

Apache Kylin is a distributed OLAP engine on Hadoop, which provides sub-second level query latency over datasets scaling to petabytes. Kylin’s superior query performance relies on pre-calculated multi-dimension Cube, which is often time-consuming to build. By default, Kylin uses MapReduce Cube Engine built atop of Hadoop MapReduce framework to aggregate huge amounts of source data. The MR Engine has been well-tuned over years and proven to be stable in hundreds of production deployments. Recently, the Kylin team is trying to further speed up the process of cube building by replacing MR with Spark. Kyligence has initiated the new Spark Cube Engine with some benchmarks between Spark and MR over different datasets, and has received some promising results. Hear about their results and experiences on moving Cube building, which is a huge computing task, to Spark. Session hashtag: #SFeco7 Learn more:

  • How Customers Win with Apache Spark on Hadoop
  • Apache Spark and Hadoop: Working Together
  • Apache Spark In MapReduce (SIMR)