Skip to main content

We’re happy to introduce a new, open source connector with Redash, a cloud-based SQL analytics service, to make it easy to query data lakes with Databricks. Traditionally, data analyst teams face issues with stale and partial data compromising the quality of their work, and want to be able to connect to the most complete and recent data available in data lakes, but their tools are often not optimized for querying data lakes. We’ve been working with the Redash team to improve a new Databricks connector that makes it easy for analysts to perform SQL queries and build dashboards directly against data lakes, including open source Delta Lake architectures, to deliver better decision making and insights.

Redash and Databricks Overview

Redash is an open source SQL-based service for analytics and dashboard visualizations. It offers a familiar SQL editor interface to browse the data schema, build queries, and view results that should be familiar to anyone who’s worked with a relational database. Queries can be easily converted into visualizations for quick insights, connected to alerts to notify on specific data events, or managed by API for automated workflows. Live web-based reports can be shared, refreshed, and modified by other teams for easy collaboration. These capabilities are driven by the Redash open source community, with over 300 contributors and more than 7,000 open source deployments of Redash globally.

With the new, performance-optimized and open source connector, Redash offers fast and easy data connectivity to Databricks for querying your data lake, including Delta Lake architectures. Delta Lake adds an open source storage layer to data lakes to improve reliability and performance, with data quality guarantees like ACID transactions, schema enforcement, and time travel, for both streaming and batch data in cloud blob storage. An optimized Spark SQL runtime running on scalable cloud infrastructure provides a powerful, distributed query engine for these large volumes of data. Together, these storage and compute layers on Databricks ensure data teams get reliable SQL queries and fast visualizations with Redash.

How to get started with the Redash Connector

You can connect Redash to Databricks in minutes. After creating a free Redash account, you’re prompted to connect to a “New Data Source”. Select “Databricks” as the data source from the menu of available options.

Easily select Databricks as a data source from the Redash interface.

The next screen prompts you for the necessary configuration details to securely connect to Databricks. You can check out the documentation for more details.

Securely connecting to Databricks using the Redash connector.

With the Databricks data source connected, you can now run SQL queries on Delta Lake tables as if it were any other relational data source, and quickly visualize the query results.

With the Redash connector, you can use Redash to run SQL queries on Delta Lake tables as if it were any other relational data source, and quickly visualize the query results.

Recap

Combining Redash's open source SQL analytics capabilities with Databricks's open data lakes gives SQL analysts an easy and powerful way to edit queries and create visualizations and dashboards directly on the organization’s most recent and complete data.  Learn more from the Redash documentation.

Try Databricks for free

Related posts

See all Company Blog posts