Apache Kylin

Try Databricks for free

What is Apache Kylin?

Apache Kylin is a distributed open source online analytics processing (OLAP) engine for interactive analytics Big Data. Apache Kylin has been designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark. In addition, it easily integrates with BI tools via ODBC driver, JDBC driver, and REST API. It was created by eBay in 2014, graduated to Top Level Project of Apache Software Foundation just one year later, in 2015 and won the Best Open Source Big Data Tool in 2015 as well as in 2016. Currently, it is being used by thousands of companies worldwide as their critical analytics application for Big Data. While other OLAP engines struggle with the data volume, Kylin enables query responses in the milliseconds. It provides sub-second level query latency over datasets scaling to petabytes. It gets its amazing speed by precomputing the various dimensional combinations and the measure aggregates via Hive queries and populating HBase with the results. Apache Kylin Infographic

How Does Apache Kylin Work?

The Kylin query engine which can be accessed in Kylin’s user-friendly UI, via an API or via JDBC will leverage the Apache Calcite query processor and HBase features for rapid lookups. Kylin relies upon the Hadoop Eco-system:

Hive – Input source, pre-join star schema during cube building
MapReduce – Aggregate metrics during cube building
HDFS – Store intermediate files during cube building
HBase – Store and query data cubes
Calcite – SQL parsing, code generation, optimization How can Apache Kylin help your organization?
Very Fast OLAP Engine at Scale - Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data to seconds
ANSI SQL Interface on Hadoop - Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions. It can easily be used by both analysts and engineers as no programming needed
Seamless Integration with BI Tools - Kylin currently offers integration capability with BI Tools like Tableau, JDBC/ODBC/Rest API
Interactive Query Capability - Users can interact with Hadoop data via Kylin at sub-second latency
MOLAP cube query serving on billions of rows - Users have the ability to define a data model and pre-build in Kylin even if it has more than 10+ billions of raw data records.

Open-source ODBC driver - Kylin’s ODBC driver is built from scratch and works very well with Tableau.

Additional Resources

Apache Kylin: Speed Up Cubing with Apache Spark

Back to Glossary