Engineering | Databricks Blog

Page 56

How to Avoid Drowning in GDPR Data Subject Requests in a Data Lake

May 1, 2018 by Justin Olsson, Sr. Legal Counsel and Michael Armbrust in Product

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. With GDPR enforcement rapidly...

Viacom’s Journey to Improving Viewer Experiences with Real-time Analytics at Scale

April 20, 2018 by Michael Ortega in Company

With over 4 billion subscribers, Viacom is focused on delivering amazing viewing experiences to their global audiences. Core to this strategy is ensuring...

Introducing Click: The Command Line Interactive Controller for Kubernetes

March 27, 2018 by Nick Lanham in Engineering

Click is an open-source tool that lets you quickly and easily run commands against Kubernetes resources, without copy/pasting all the time, and that...

Introducing Low-latency Continuous Processing Mode in Structured Streaming in Apache Spark 2.3

March 20, 2018 by Joseph Torres, Michael Armbrust, Tathagata Das and Shixiong Zhu in Open Source

Import this notebook on Databricks Structured Streaming in Apache Spark 2.0 decoupled micro-batch processing from its high-level APIs for a couple of reasons...

Introducing Stream-Stream Joins in Apache Spark 2.3

March 13, 2018 by Tathagata Das and Joseph Torres in Engineering

Since we introduced Structured Streaming in Apache Spark 2.0 , it has supported joins (inner join and some type of outer joins) between...

Announcing Machine Learning Model Export in Databricks

March 7, 2018 by Wayne Chan in Company

In recent years, machine learning has become ubiquitous in industry and production environments. Both academic and industry institutions had previously focused on training...

Apache Spark 2.3 with Native Kubernetes Support

March 6, 2018 by Anirudh Ramanathan and Palak Bhatia in Solutions

This is a community blog from Anirudh Ramanathan and Palak Bhatia , software engineer and product manager respectively at Google, working in the...

Introducing Apache Spark 2.3

February 28, 2018 by Sameer Agarwal, Xiao Li, Reynold Xin and Jules Damji in Engineering

Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0. We want...

Accelerate Innovation with Microsoft Azure Databricks

January 22, 2018 by Brian Dirking in Company

It’s hard to believe that we are already three weeks into 2018. If you’re still struggling to get valuable insights from your data...

Matei Zaharia’s 5 predictions about big data and AI in 2018

January 17, 2018 by Matei Zaharia in Company

Over the past few years, the demand for artificial intelligence (AI) and machine learning capabilities has surged with innovations in natural language processing...