Engineering | Databricks Blog

Page 38

On-Demand Virtual Session: Customer Lifetime Value

June 16, 2020 by Rob Saker, Bryan Smith, Hector Leano and Steve Sobel in Engineering

Before you can provide personalized services and offers to your customers, you need to know who they are. In this virtual workshop, retail...

Simplify Data Conversion from Apache Spark to TensorFlow and PyTorch

June 16, 2020 by Liang Zhang and Weichen Xu in Engineering

Petastorm is a popular open-source library from Uber that enables single machine or distributed training and evaluation of deep learning models from datasets...

Accelerating Somatic Variant Calling with the Databricks TNSeq Pipeline

June 15, 2020 by Henry Davidge and Frank Nofthaft in Engineering

Genetic analyses are a critical tool in revolutionizing how we treat cancer. By understanding the mutations present in tumor cells, researchers can gain...

Data Teams Unite! Countdown to Spark + AI Summit

June 10, 2020 by Diane Romualdez in Company

Here is a helpful guide to get your prepared for Spark + AI Summit, June 22-26, 2020 Spark + AI Summit 2020 is...

Modernizing Risk Management Part 2: Aggregations, Backtesting at Scale and Introducing Alternative Data

June 5, 2020 by Antoine Amend in Platform

Understanding and mitigating risk is at the forefront of any financial services institution. However, as previously discussed in the first blog of this...

Automate continuous integration and continuous delivery on Databricks using Databricks Labs CI/CD Templates

June 5, 2020 by Michael Shtelma and Thunder Shiviah in Platform

CONTENTS Overview Why do we need yet another deployment framework? Simplifying CI/CD on Databricks via reusable templates Development lifecycle using Databricks Deployments How...

Customer Lifetime Value Part 1: Estimating Customer Lifetimes

June 3, 2020 by Rob Saker, Bryan Smith, Bilal Obeidat and Chris Robison in Solutions

Download the Customer Lifetimes Part 1 notebook to demo the solution covered below, and watch the on-demand virtual workshop to learn more. You...

Monitor Your Databricks Workspace with Audit Logs

June 2, 2020 by Craig Ng and Miklos Christine in Platform

Cloud computing has fundamentally changed how companies operate - users are no longer subject to the restrictions of on-premises hardware deployments such as...

Vectorized R I/O in Upcoming Apache Spark 3.0

June 1, 2020 by Hyukjin Kwon in Platform

R is one of the most popular computer languages in data science, specifically dedicated to statistical analysis with a number of extensions, such...

Adaptive Query Execution: Speeding Up Spark SQL at Runtime

May 29, 2020 by Wenchen Fan, Herman van Hövell and MaryAnn Xue in Engineering

Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...