Apache Kylin: Speed Up Cubing with Apache Spark - Databricks

Apache Kylin: Speed Up Cubing with Apache Spark

Download Slides

Apache Kylin is a distributed OLAP engine on Hadoop, which provides sub-second level query latency over datasets scaling to petabytes. Kylin’s superior query performance relies on pre-calculated multi-dimension Cube, which is often time-consuming to build. By default, Kylin uses MapReduce Cube Engine built atop of Hadoop MapReduce framework to aggregate huge amounts of source data. The MR Engine has been well-tuned over years and proven to be stable in hundreds of production deployments. Recently, the Kylin team is trying to further speed up the process of cube building by replacing MR with Spark. Kyligence has initiated the new Spark Cube Engine with some benchmarks between Spark and MR over different datasets, and has received some promising results. Hear about their results and experiences on moving Cube building, which is a huge computing task, to Spark.

Session hashtag: #SFeco7

Learn more:

  • How Customers Win with Apache Spark on Hadoop
  • Apache Spark and Hadoop: Working Together
  • Apache Spark In MapReduce (SIMR)
  • About Shaofeng Shi

    Shaofeng Shi is a software architect from Kylingence Inc. He is the committer and PMC member of Apache Kylin project. He developed a couple of core features in Kylin, and has abundant experience in Hadoop and Kylin enablement. Before joining Kyligence, he was a senior software engineer in eBay, CCOE and IBM China Lab.

    About Luke Han

    Luke Han is Co-Founder and CEO at Kyligence, co-creator and PMC chair of Apache Kylin project; In past few years he had been working on growing Apache Kylin's community, building ecosystem, and extending adoptions. Prior to Kyligence, he was the Big Data Product Lead at eBay. Prior to eBay, Luke was chief consultant at Actuate China.