In order to support the use cases that are important to data analysts and business users, Databricks SQL provides data warehousing performance at data lake economics.
In this demo, we walk through some of the features of the new Databricks SQL that are important to data analysts, including the integrated data browser, SQL query editor with live autocomplete, built-in data visualization tools, and flexible dashboarding and alerting capabilities. We also cover how Databricks SQL endpoints provide a high-performance, low latency, SQL-optimized compute resource that can power your existing BI tools like PowerBI and Tableau.
The lakehouse is an open, simple platform to store and manage all of your data, and support all of your analytics and AI use cases. Databricks SQL allows data analysts and SQL users to query all of the data in the lakehouse, by offering scalable, cost-effective clusters optimized for SQL workloads.
When combined with Delta Lake, the open source storage layer for data lakes, you get the reliability of ACID transactions and query results with lightning fast performance, even with multiple concurrent users.
Databricks SQL also makes it easier to quickly visualize and share insights with a new and easy to use SQL query editor, rich dashboards, and automatic alerts. Or, you can connect Databricks SQL endpoints to a broad set of BI tools including Tableau and Power BI for reliable data and fast query performance.
Databricks SQL allows you to build a lakehouse architecture that provides data warehousing performance at data lake economics, for up to 9x better price/performance than traditional cloud data warehouses.
Let’s look at an example. Imagine that you and I are data analysts at a large coffee company, and we want to determine which countries to source beans from this week to get the best value.
To find some relevant data, let’s look through the data lake using the Data Browser. Once we choose the “coffee” database, we can see the tables it contains, and the schema of each one, including the names and data types of each column.
This looks like the data we need. Let’s write a query to find the average rating of coffee beans in each country using the interactive query editor. Databricks SQL’s live autocomplete offers suggestions to help us find what we’re searching for quickly, and pressing the tab button accepts the suggestion.
A chart could probably help us understand the information more easily, so let’s visualize this data. As you can see we have lots of options — funnels, Sankey charts, Sunbursts, and more — but let’s choose a map since our data is grouped by country.
Looks good. We’re going to want to keep an eye on this query, so let’s give it a name, add some searchable tags, and schedule it to run every 24 hours, since new bean rating reports come streaming in all the time. Just like coffee beans, we want to keep our data fresh, and never stale.
Perfect – the rest of our data team should see this right away. Databricks SQL makes it easy to add this chart to a dashboard and share it directly with key stakeholders. We can also use the access controls to tailor permissions on a group-wide basis.
So now that we have some good ratings data, we should also bring in some price data to help our stakeholders understand which beans offer the highest quality at the lowest price.
So right now it looks like Brazilian coffee offers us the best value. But if that changes, I’d like to know about it without having to check the dashboard everyday. So with Databricks SQL, we can set up value-based alerts, so that we’ll get an email if the value per dollar dips below the threshold that we define.
Now, just to switch gears a bit, let’s imagine that we’re administrators, and let’s walk through some of the controls we can use to manage our Lakehouse.
We can manage SQL Endpoints from the “Endpoints” tab. SQL Endpoints are optimized for low latency SQL workloads and offer fast query performance under high concurrency. They are easy to set up too! We can select the size of cluster we want using simple “T-shirt sizing.” And we can add multi-cluster load balancing if we know we’re going to have a lot of analysts querying the data at once, so that query performance doesn’t suffer, even under heavy load. That endpoint will automatically scale according to how many users are trying to access the Data Lake.
We can also manage the Query History. The Query History allows us to view all of the recent queries that ran on this Endpoint. It looks like someone has been doing a lot of querying of the coffee database! We can sort by query, endpoint, user, and time. Once we’ve selected a query, we can view details including phases of execution and query performance. This information is useful for debugging and performance optimization, as well as for audit and governance purposes.
So the Databricks SQL UI works great for exploratory data analysis, quick dashboarding, and setting up alerts for you and your team. But if you already use an existing BI application like Tableau, Looker, or PowerBI, you can still use Databricks SQL on the backend as the scalable query engine that provides clean, reliable data and fast query performance to power your applications. Databricks SQL Endpoints offer a reliable connection to your Delta Lake, and seamlessly integrate with third party BI tools using Single Sign On.
Here we’ve set up a connection to Tableau that allows us to view the tables in the “coffee” database we queried earlier. We were also able to recreate the dashboard we built earlier in Tableau, using the Databricks SQL UI.
Databricks SQL offers data analysts a fresh new way to work and play with data in Databricks. Try Databricks SQL on Databricks, today.