Analytics on the Data Lake With Tableau and the Lakehouse Architecture

Over the past two years we’ve seen a number of organizations moving their data work to the cloud. It simplifies access and scales to handle the biggest volumes. At Tableau, we’re all about customer choice and flexibility, and we’ve enabled our customers to move to the cloud faster than ever.

Analytics and data science/machine learning efforts are beginning to converge, and we’re seeing growing interest in connecting directly to data lakes for analysis as a result. A lot of data is coming into a cloud data lake very fast, from web logs and IoT sensors and it tends to be messy. We need a way to make sense of the data, and to have it delivered in a reliable and performant manner.

To enable this, we’re seeing more and more of our customers adopting a Lakehouse architecture. This new architecture takes the best of data lakes (low cost, flexible content structures) and data warehouses (high performance, data reliability) into a single place to store your data. With our partner Databricks, we’ve seen a number of joint customers adopt a lakehouse architecture to power their Tableau deployments. Databricks uses Delta Lake to enable a lakehouse architecture by improving the performance and reliability of data lake, so Tableau users can query the data lake directly.

This week Databricks is announcing a new SQL Analytics service that is going to provide Tableau customers with an entirely new experience for analyzing data that resides in the data lake. The performance and scale that they can achieve are unlike anything we’ve seen before.

Tableau users will be the most excited by the new SQL Analytics Endpoints which can be used immediately with our existing Databricks connector, no update required. This will improve access to your data lake for analytics in two ways:

  • Simple setup. SQL Analytics endpoints simplify the configuration of Databricks clusters used by Tableau to query the data lake, There is no need to deal with cluster management for Tableau users, just connect to Databricks SQL Analytics endpoint and go!
  • Performance improvements. SQL Analytics uses the Databricks Delta Engine, a vectorized query engine with an improved query optimizer and caching capabilities for really fast query performance.

Delta Engine architecture used with the new SQL Analytics service for Tableau from Databricks.

Customer Examples

Here are some examples of customers who are using a Lakehouse architecture with Databricks and Tableau.

  • Wehkamp uses Databricks with Delta Lake as a data lake, serving their entire organization for reporting and ad-hoc analysis using Tableau, and using Databricks for data science. You can read about Wehkamp’s implementation in this case study.
  • Flipp, a retail service provider, uses Databricks with Delta Lake to create a lakehouse that their data science team uses for machine learning, their engineering team uses for product feature analysis, and their sales team uses to provide analysis to their customers with Tableau. You can watch their session at the Tableau Conference.
  • The US Air Force uses Databricks with Delta Lake to manage all their cash flow analytics, and then provide the results in Tableau to analyze over 65 million records per quarter. You can watch the US Air Force present their implementation at the Data + AI Industry Leadership Forum.
Sample BI visualization in Tableau demonstrating the powerful analytics capabilities made possible by the Lakehouse architecture pioneered by Databricks.
Figure 2. A sample Flipp visualization

Learn more about Databricks and Tableau here.

Try Databricks for free Get started

Sign up