The AMPLab at UC Berkeley, with help from Databricks, recently released an update to the Big Data Benchmark. This benchmark uses Amazon EC2 to compare performance of five popular SQL query engines in the Big Data ecosystem on common types of queries, which can be reproduced through publicly available scripts and datasets.
In the past year, the community has invested heavily in performance optimizations of query engines. We are glad to see that all projects have evolved in this area. Although the queries used in the benchmark are simple, we are proud that Shark remains one of the fastest engines for these workloads, and has improved significantly since the last run.
While this benchmark reaffirms Shark as a highly performant SQL query engine, we are working hard at Databricks to push the boundaries further. Stay tuned for some exciting news we will share soon with the community.