New Capabilities Help Data Teams Build a “Lakehouse” for Unified Data Analytics
San Francisco, JUNE 24, 2020 – Databricks, the data and AI company, today announced the availability of Delta Engine and the acquisition of Redash. These new capabilities make it faster and easier for data teams to use its Unified Data Analytics platform for data science, machine learning, and a broad range of data analytics use cases. Delta Engine is a high-performance query engine on cloud data lakes and Redash is an open-source dashboarding and visualization service for data scientists and analysts to do data exploration.
Delta Engine is tailored for use with Delta Lake, the popular open-source structured transaction layer that brings quality and reliability to data lakes. Organizations can now build curated data lakes that include structured and semi-structured data and run all their analytics on high quality, fresh data in the cloud. Databricks acquired Redash, the company behind the successful Redash open source project, to provide easy-to-use dashboarding and visualization capabilities on these curated data lakes. With Redash, data scientists and SQL analysts can eliminate the complexity of moving data into other systems for analysis.
Together, these enhancements enable organizations to adopt a single, simplified cloud architecture for data management, helping them significantly reduce costs and complexity and accelerate data team productivity.
They are also a response to the emerging “Lakehouse” design pattern, which many enterprise organizations have adopted to bring structured transactions, quality and performance to their cloud data lakes. The announcements were made today at Spark + AI Summit, which takes place this week virtually with over 60,000 members of the data community, from over 100 countries.
“Most organizations who want to do data science and data warehousing are using multiple architectures. Data is stuck in organizational silos, defined by closed and proprietary systems that slow organizations down and make it harder to arrive at high quality decisions because information is fragmented and out of date,” said Ali Ghodsi, cofounder and CEO at Databricks. “Curated cloud data lakes provide organizations a way to run any kind of analytics, including data science and machine learning, on all their most recent data. Our introduction of Delta Engine and the acquisition of Redash are significant steps forward in helping organizations build these high quality, curated data lakes that some call ‘Lakehouses’.”
Delta Engine Enables Fast Query Performance on Delta Lake
Traditional data analytics on structured and semi-structured data demand very fast performance to keep up with the pace of business. Historically, organizations have duplicated the data in their data lakes across a variety of data warehouses and operational systems because the tools to query and analyze data are not well-suited for fast query execution. However, managing this architectural complexity introduces challenges, including fragmented and inconsistent data silos and substantially increased costs.
Databricks’ new Delta Engine for Delta Lake enables fast query execution for data analytics and data science, without moving the data out of the data lake. The high-performance query engine has been built from the ground up to take advantage of modern cloud hardware for accelerated query performance. With this improvement, Databricks customers are able to move to a unified data analytics platform that can support any data use case and result in meaningful operational efficiencies and cost savings.
Delta Lake was released in 2017 by Databricks and donated to the Linux Foundation in 2019. Since its introduction, Delta Lake has been adopted by Comcast, Condé Nast, Nielsen, FINRA, Shell, and thousands of other organizations. Today’s announcements build upon the success of the Delta Lake project to extend beyond storing and managing data into how data is used and consumed.
Redash Makes It Easy for Data Scientists and Analysts to Consume Data
The open source Redash project was created to help data teams make sense of their data. Data scientists and SQL analysts can easily gather a wide variety of data sources, including operational databases, data lakes, and Delta Lake, into thematic dashboards. The results can be visualized in a wide variety of formats like charts, cohorts, and funnels, and are easily shareable across an organization or with external users.
Millions of users at thousands of organizations are already using Redash to develop insights and make data actionable. The open source project was created by a passionate community of developers and was built by over 300 contributors from around the world since the project launched in 2013. The open source Redash project can be used with Databricks today using a free connector, and Redash will be fully integrated into Databricks’ Unified Data Analytics Platform and the Databricks workspace in the coming months and take advantage of capabilities like Delta Engine.
Databricks is the data and AI company. Thousands of organizations worldwide — including Comcast, Condé Nast, Nationwide and H&M — rely on Databricks’ open and unified platform for data engineering, machine learning and analytics. Databricks is venture-backed and headquartered in San Francisco, with offices around the globe. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.
Head of Communications