Solutions | Databricks Blog

Page 11

Customer Lifetime Value Part 1: Estimating Customer Lifetimes

June 3, 2020 by Rob Saker, Bryan Smith, Bilal Obeidat and Chris Robison in Solutions

Download the Customer Lifetimes Part 1 notebook to demo the solution covered below, and watch the on-demand virtual workshop to learn more. You...

Vectorized R I/O in Upcoming Apache Spark 3.0

June 1, 2020 by Hyukjin Kwon in Platform

R is one of the most popular computer languages in data science, specifically dedicated to statistical analysis with a number of extensions, such...

Adaptive Query Execution: Speeding Up Spark SQL at Runtime

May 29, 2020 by Wenchen Fan, Herman van Hövell and MaryAnn Xue in Engineering

Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...

Schema Evolution in Merge Operations and Operational Metrics in Delta Lake

May 19, 2020 by Tathagata Das and Denny Lee in Platform

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Try this notebook to...

Shrink Training Time and Cost Using NVIDIA GPU-Accelerated XGBoost and Apache Spark™ on Databricks

May 15, 2020 by Niranjan Nataraja and Karthikeyan Rajendran in Solutions

Guest Blog by Niranjan Nataraja and Karthikeyan Rajendran of Nvidia. Niranjan Nataraja is a lead data scientist at Nvidia and specializes in building...

Now on Databricks: A Technical Preview of Databricks Runtime 7 Including a Preview of Apache Spark 3.0

May 13, 2020 by Yin Huai, Wenchen Fan and Xiao Li in Platform

Introducing Databricks Runtime 7.0 Beta We’re excited to announce that the Apache Spark TM 3.0.0-preview2 release is available on Databricks as part of...

Glow 0.3.0 Introduces New Large-Scale Genomic Analysis Features

April 23, 2020 by Kiavash Kianfar in Engineering

In October of last year, Databricks and the Regeneron Genetics Center ® partnered together to introduce Project Glow , an open-source analysis tool...

COVID-19 Datasets Now Available on Databricks: How the Data Community Can Help

April 14, 2020 by Denny Lee in Engineering

Initially published April 14th, 2020; updated April 21st, 2020 With the massive disruption of the current COVID-19 pandemic, many data engineers and data...

10 Minutes from pandas to Koalas on Apache Spark

March 30, 2020 by Haejoon Lee, Yifan Cao, Hyukjin Kwon and Takuya Ueshin in Solutions

This is a guest community post from Haejoon Lee, a software engineer at Mobigen in South Korea and a Koalas contributor. pandas is...

Trust but Verify with Databricks

March 24, 2020 by Anna Shrestinian, Abhinav Garg and Sajith Appukuttan in Platform

As enterprises modernize their data infrastructure to make data-driven decisions, teams across the organization become consumers of that platform. The data workloads grow...