By Customer Demand: Databricks and Snowflake Integration
Today, we are proud to announce a partnership between Snowflake and Databricks that will help our customers further unify Big Data and AI by providing an optimized, production-grade integration between Snowflake’s built for the cloud-built data warehouse and Databricks’ Unified Analytics Platform. Over the course of the last year, our joint customers such as Rue...
Databricks Delta: A Unified Data Management System for Real-time Big Data
Combining the best of data warehouses, data lakes and streaming For an in-depth look and demo, join the webinar. Today we are proud to introduce Databricks Delta, a unified data management system to simplify large-scale data management. Currently, organizations build their big data architectures using a mix of systems, including data warehouses, data lakes and...
Arbitrary Stateful Processing in Apache Spark’s Structured Streaming
This is the seventh post in a multi-part series about how you can perform complex streaming analytics using Apache Spark and Structured Streaming. Introduction Most data streams, though continuous in flow, have discrete events within streams, each marked by a timestamp when an event transpired. As a consequence, this idea of “event-time” is central to...
Best Practices for Coarse Grained Data Security in Databricks
At Databricks, we work with hundreds of companies, all pushing the bleeding edge in their respective industries. We want to share patterns for securing data so that your organization can leverage best practices as opposed to recreating the wheel when you on -board to Databricks’ Unified Analytics Platform. This post is primarily aimed at those...
Sharing Knowledge with the Community in a Preview of Apache Spark: The Definitive Guide
Apache Spark has seen immense growth over the past several years. The size and scale of this Spark Summit is a true reflection of innovation after innovation that has made itself into the Apache Spark project. Hundreds of contributors working collectively have made Spark an amazing piece of the technology powering thousands of organizations, and...
Transactional Writes to Cloud Storage on Databricks
In another blog post published today, we showed the top five reasons for choosing S3 over HDFS. With the dominance of simple and effective cloud storage systems such as Amazon S3, the assumptions of on-premise systems like Apache Hadoop are becoming, sometimes painfully, clear. Apache Spark users require both fast and transactionally correct writes to...
Working with Nested Data Using Higher Order Functions in SQL on Databricks
Nested data types offer Databricks customers and Apache Spark users powerful ways to manipulate structured data. In particular, they allow you to put complex objects like arrays, maps and structures inside of columns. This can help you model your data in a more natural way. While this feature is certainly useful, it can be a...
Taking Apache Spark’s Structured Streaming to Production
This is the fifth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. At Databricks, we’ve migrated our production pipelines to Structured Streaming over the past several months and wanted to share our out-of-the-box deployment model to allow our customers to rapidly build production pipelines in Databricks. A...
Query Watchdog: Handling Disruptive Queries in Spark SQL
At Databricks, our users range from SQL Analysts who explore data through JDBC connections and SQL Notebooks to Data Engineers orchestrating large scale ETL jobs. While this is great for data democratization, one challenge associated with exploratory data analysis is handling rogue queries that appear as if they will finish, but never actually will. These...
Databricks Launches a Comprehensive Guide for Its Product and Apache Spark
We are proud to announce the launch of a new online guide for Databricks and Apache Spark at docs.databricks.com. Our goal is to create a definitive resource for Databricks users and the most comprehensive set of Apache Spark documentation on the web. As a result, we've dedicated a large portion of the guide to Spark...