Solutions | Databricks Blog

Page 10

ACID Transactions on Data Lakes Tech Talks: Getting Started with Delta Lake

November 23, 2020 by Ryan Boyd in Platform

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. As part of our...

How to Train XGBoost With Spark

November 16, 2020 by Stephen Offer in Data Science and ML

XGBoost is currently one of the most popular machine learning libraries and distributed training is becoming more frequently required to accommodate the rapidly...

Detecting Criminals and Nation States through DNS Analytics

October 5, 2020 by Monzy Merza, Zafer Bilaloglu and Arun Pamulapati in Platform

Quick link to the accelerator notebooks referenced through this post. You are a security practitioner, a data scientist or a security data engineer...

Announcing Databricks Labs Terraform integration on AWS and Azure

September 11, 2020 by Serge Smertin and Sri Tikkireddy in Platform

We are pleased to announce integration for deploying and managing Databricks environments on Microsoft Azure and Amazon Web Services (AWS) with HashiCorp Terraform...

An Update on Project Zen: Improving Apache Spark for Python Users

September 4, 2020 by Hyukjin Kwon and Matei Zaharia in Solutions

Apache Spark™ has reached its 10th anniversary with Apache Spark 3.0 which has many significant improvements and new features including but not limited...

Enabling Spark SQL DDL and DML in Delta Lake on Apache Spark 3.0

August 27, 2020 by Tathagata Das, Burak Yavuz and Denny Lee in Solutions

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Last week, we had...

Interoperability between Koalas and Apache Spark

August 11, 2020 by Takuya Ueshin, Hyukjin Kwon and Xiao Li in Solutions

Koalas is an open source project which provides a drop-in replacement for pandas, enabling efficient scaling out to hundreds of worker nodes for...

A look at the new Structured Streaming UI in Apache Spark 3.0

July 29, 2020 by Genmao Yu, Yuanjian Li and Shixiong Zhu in Platform

This is a guest community post from Genmao Yu, a software engineer at Alibaba. Structured Streaming was initially introduced in Apache Spark 2.0...

Allow Simple Cluster Creation with Full Admin Control Using Cluster Policies

July 2, 2020 by Greg Wood and Rebecca Li in Platform

What is a Databricks cluster policy? A Databricks cluster policy is a template that restricts the way users interact with cluster configuration. Today...

Time Traveling with Delta Lake: A Retrospective of the Last Year

June 18, 2020 by Burak Yavuz and Denny Lee in Platform

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Try out Delta Lake...