Skip to main content

Apache Kudu

Try Databricks for free

What is Apache Kudu?

Apache Kudu is a free and open source columnar storage system developed for the Apache Hadoop. It is an engine intended for structured data that supports low-latency random access millisecond-scale access to individual rows together with great analytical access patterns. It is a Big Data engine created make the connection between the widely spread Hadoop Distributed File System [HDFS] and HBase NoSQL Database.

Main advantages of Apache Kudu in the support of business intelligence [BI] on Hadoop

Enables real-time analytics on fast data

Apache Kudu merges the upsides of HBase and Parquet. It is as fast as HBase at ingesting data and almost as quick as Parquet when it comes to analytics queries. It supports multiple query types, allowing you to perform the following operations:

  • Lookup for a certain value through its key.
  • Lookup for a range of keys that have been sorted in a key-order.
  • Carry out arbitrary queries across as many columns as needed

Fully distributed and fault tolerant

Apache Kudu uses the RAFT consensus algorithm, as a result, it can be scaled up or down as required horizontally. In addition it comes with a support for update-in-place feature.

Takes advantage of the upcoming generation of hardware

Apache Kudu comes optimized for SSD and it is designed to take advantage of the next persistent memory. It is able to scale to 10s of cores per server and even benefit of SIMD operations for data-parallel computation.

Provides the mutability required for BI on big data

It features a ‘slowly changing dimension’ also known as SCD. This capability allows the user to keep track of changes inside a dimensional reference data.

Kudu Supports SQL if used with Spark or Impala

Do you want to access data via SQL? Then, you’ll be happy to hear that Apache Kudu has tight integration with Apache Impala as well as Spark. As a result, you will be able to use these tools to insert, query, update and delete data from Kudu tablets by using their SQL syntax. Moreover, you can use JDBC or ODBC to connect existing or new applications no matter the language they have been written in, frameworks, and even business intelligence tools to your Kudu data, using Impala as the tootle to do this.

Additional Resources

Back to Glossary