Skip to main content
<
Page 146
>

How the Minnesota Twins Scaled Pitch Scenario Analysis to Measure Player Performance - Part 1

Statistical Analysis in the Game of Baseball A single pitch in Major League Baseball (MLB) generates tens of megabytes of data, from pitch...

Customer Lifetime Value Part 1: Estimating Customer Lifetimes

Download the Customer Lifetimes Part 1 notebook to demo the solution covered below, and watch the on-demand virtual workshop to learn more. You...

Monitor Your Databricks Workspace with Audit Logs

June 2, 2020 by Craig Ng and Miklos Christine in
Cloud computing has fundamentally changed how companies operate - users are no longer subject to the restrictions of on-premises hardware deployments such as...

Vectorized R I/O in Upcoming Apache Spark 3.0

June 1, 2020 by Hyukjin Kwon in
R is one of the most popular computer languages in data science, specifically dedicated to statistical analysis with a number of extensions, such...

Adaptive Query Execution: Speeding Up Spark SQL at Runtime

Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...

Announcing a New Redash Connector for Databricks

May 28, 2020 by Can Efeoglu and James Nguyen in
We’re happy to introduce a new, open source connector with Redash , a cloud-based SQL analytics service, to make it easy to query...

Automating away engineering on-call workflows at Databricks

May 28, 2020 by Andrew Nitu in
A Summer of Self-healing This summer I interned with the Cloud Infrastructure team. The team is responsible for building scalable infrastructure to support...

Modernizing Risk Management Part 1: Streaming data-ingestion, rapid model development and Monte-Carlo Simulations at Scale

May 27, 2020 by Antoine Amend in
Part 2 of this accelerator here . Managing risk within the financial services , especially within the banking sector, has increased in complexity...

MLOps takes center stage at Spark + AI Summit

May 26, 2020 by Ben Lorica in
As companies ramp up machine learning, the growth in the number of models they have under development begins to impact their set of...

New Pandas UDFs and Python Type Hints in the Upcoming Release of Apache Spark 3.0

May 19, 2020 by Hyukjin Kwon in
Pandas user-defined functions (UDFs) are one of the most significant enhancements in Apache Spark TM for data science. They bring many benefits, such...