Databricks Launches Delta to Combine the Best of Data Lakes, Data Warehouses and Streaming Systems

October 25, 2017

Industry’s first unified data management system delivers the scale of a data lake, the reliability and query performance of a data warehouse, and the low-latency of streaming

San Francisco, Calif. and Dublin, Ireland — October 25, 2017 – Databricks, provider of the leading Unified Analytics Platform and founded by the team who created Apache Spark™, today announced Databricks Delta, the first unified data management system that provides the scale and cost-efficiency of a data lake, the reliability and query performance of a data warehouse, and the low latency of a streaming ingest system. Databricks Delta, a key component of the Databricks Unified Analytics Platform that runs in the cloud, eliminates the architectural complexity and operational overhead of maintaining three disparate systems: data lakes, data warehouses and streaming systems. With Delta, enterprise organizations no longer need complex, brittle extract, transform, and load (ETL) processes that run across a variety of systems and create high latency just to obtain access to relevant, business-critical data.

“At Edmunds, obtaining real-time customer and revenue insights is critical to our business. But we’ve always been challenged with complex ETL processing that slows down our access to data,” said Greg Rokita, executive director of technology at Edmunds.com. “Databricks Delta allows us to overcome this roadblock by blending the performance of a data warehouse with the scale and cost-efficiency of a data lake. We now have a simplified data architecture that enables immediate access to business-critical data.”

“Many enterprise organizations are struggling with the limitations of data lakes and data warehouses as well as the complexity of managing both and moving data between them,” said Ali Ghodsi, cofounder and chief executive officer at Databricks. “Delta combines the reliability and performance of data warehouses with the scale of data lakes and low-latency of streaming systems. With this unified management system, enterprises now benefit from a simplified data architecture, up to 100x increase in query performance, and faster access to relevant data - increasing their ability to make decisions that drive results. We have solved a massive struggle facing organizations that are on a mission to run their business in real-time.”

Databricks Delta delivers the following capabilities to simplify enterprise data management:

Manage Continuously Changing Data Reliably: Industry’s first unified data management system simplifies pipelines by allowing Delta tables to be used as a data source and sink. Delta tables provide transactional guarantees for multiple concurrent writers - batch and streaming jobs. Delta natively supports the real-time needs of the business by enabling a streaming data warehouse to return the most recent, consistent view of the writes. Upserts in Delta provide a clean way to change data after it has been written, instead of running the entire job again.
Perform Fast Queries Without Manual Tuning: Delta automates performance management, removing the need for tedious performance tuning approaches. Self-optimizing data layout ensure data queried together is stored together. Delta automates compaction of small files for efficient reads. Intelligent data skipping and indexing leads to massive speedups by not reading unneeded data. Automated caching leads to subsequent reads being an order of magnitude faster.
Provide cost efficiency and scale of Data Lakes: Delta stores all its data in Amazon S3 for cost-efficiency and massive scale. The data in Delta is stored in a non-proprietary and open file format to ensure data portability and prevent vendor lock-in.
Integrate with Unified Analytics Platform: Databricks Delta data can be accessed from any Spark application running on the Databricks platform through the standard Spark APIs. Delta also integrates into the Databricks Enterprise Security model, including cell-level access control, auditing, and HIPAA-compliant processing. Data is stored inside customer’s own cloud storage account for maximum control.

Databricks made today’s announcement at Spark Summit Europe 2017 during Ali Ghodsi’s keynote. For more information on Spark Summit keynotes and sessions.

Visit databricks.com for more information.

Contact Databricks to get started: https://www.databricks.com/company/contact.

About Databricks

Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Salesforce, Viacom, Shell and HP. For more information, visit www.databricks.com.

Media Contact:

Stacey Collins Burbach

P: 415-310-9767

E: [email protected]

Contact our press team

Press inquiries:[email protected]

Press Kit

Explore our press kit for company news, info, logos, photos and more.

Share this article