Skip to main content
<
Page 10
>

Announcing Databricks Labs Terraform integration on AWS and Azure

September 11, 2020 by Serge Smertin and Sri Tikkireddy in
We are pleased to announce integration for deploying and managing Databricks environments on Microsoft Azure and Amazon Web Services (AWS) with HashiCorp Terraform...

An Update on Project Zen: Improving Apache Spark for Python Users

September 4, 2020 by Hyukjin Kwon and Matei Zaharia in
Apache Spark™ has reached its 10th anniversary with Apache Spark 3.0 which has many significant improvements and new features including but not limited...

Enabling Spark SQL DDL and DML in Delta Lake on Apache Spark 3.0

August 27, 2020 by Tathagata Das, Burak Yavuz and Denny Lee in
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Last week, we had...

Interoperability between Koalas and Apache Spark

August 11, 2020 by Takuya Ueshin, Hyukjin Kwon and Xiao Li in
Koalas is an open source project which provides a drop-in replacement for pandas, enabling efficient scaling out to hundreds of worker nodes for...

A look at the new Structured Streaming UI in Apache Spark 3.0

This is a guest community post from Genmao Yu, a software engineer at Alibaba. Structured Streaming was initially introduced in Apache Spark 2.0...

Allow Simple Cluster Creation with Full Admin Control Using Cluster Policies

July 2, 2020 by Greg Wood and Rebecca Li in
What is a Databricks cluster policy? A Databricks cluster policy is a template that restricts the way users interact with cluster configuration. Today...

Time Traveling with Delta Lake: A Retrospective of the Last Year

June 18, 2020 by Burak Yavuz and Denny Lee in
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Try out Delta Lake...

Customer Lifetime Value Part 1: Estimating Customer Lifetimes

Download the Customer Lifetimes Part 1 notebook to demo the solution covered below, and watch the on-demand virtual workshop to learn more. You...

Vectorized R I/O in Upcoming Apache Spark 3.0

June 1, 2020 by Hyukjin Kwon in
R is one of the most popular computer languages in data science, specifically dedicated to statistical analysis with a number of extensions, such...

Adaptive Query Execution: Speeding Up Spark SQL at Runtime

Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...