A core component of Databricks is the Data Science Workspace, which enables collaboration among everyone in the data team. The collaborative notebook environment is used by everyone on the data team: data scientists, data analysts, data engineers and others. Databricks is used by a wide variety of industries for an equally expansive set of use cases. This gallery showcases some of the possibilities through Notebooks which can easily be imported into your own Databricks environment or the free community edition (CE).

Delta Lake

Build your data lakehouse and get ACID transactions, time travel, contraints and more on open file formats

Databricks: 7.6.x – not CE

Deep Dive into Delta Lake

This is a deep dive into Delta Lake, which is an open-source storage format that brings ACID transactions to Apache Spark™.

DeltaPySparkデータ管理

Databricks: 8.0.x

Using Delta Lake from R

This is a quick 101 introduction to using Delta Lake, which is an open-source storage format, using SparkR.

rDeltaデータ管理

Databricks: 7.6.x

Tutorial: Intro to Delta Lake

Delta Lake: An open-source storage format that brings ACID transactions to Apache Spark and big data workloads.

DeltaPySparkタイムトラベルデータ管理

Koalas

Effortlessly move your pandas data science code from single-node to distributed clusters

Databricks: 7.6.x

Pandas to Koalas in 10 minutes

A gentle introduction for those familiar with pandas on transitioning to Koalas for large-scale problems.

edaPandaskoalasPySpark

機械学習

Support for popular machine learning frameworks like TensorFlow, Spark MLlib, Horovod

Databricks: 7.6.x w/GPU – not CE

Distributed deep learning with PyTorch and Horovod

Learn how to perform distributed training of models in PyTorch using Horovod.

PyTorchHorovoddistributed training

Databricks: 8.1.x

Build a streaming ML application with Spark

Build a streaming ML application that monitors credit card fraud using Spark.

PySpark不正検出Structured Streaming:
構造化ストリーミング

Databricks: 7.6.x

Getting started with Spark MLlib

An introduction to using the Spark MLlib library for ML applications.

MLlibPySpark特徴量エンジニアリングhyperparameter search

Databricks: 7.6.x w/GPU – not CE

From Spark to TensorFlow: Simplify your data conversion

Simplify the conversion of data from Spark DataFrames for use with TensorFlow.

PySparkpetastormTensorFlowdistributed training

Databricks: 7.6.x w/GPU – not CE

Distributed deep learning with TensorFlow 2

Learn how to perform distributed training with TensorFlow 2.

TensorFlow深層学習distributed training

MLflow

End-to-end support for machine learning: from training your models to moving them into production

Databricks: 7.6.x

Get started with logging for ML projects with MLflow

An introduction to the MLflow logging API for ML workflow management.

MLflowsckikit-learnrandom forest

Databricks: 7.6.x

Quick Start : How to use MLflow fluent tracking APIs

Learn how to use the high-level fluent tracking APIs in MLflow.

MLflowfluentsckikit-learnrandom forest

Databricks: 7.6.x – not CE

An end-to-end example of machine learning for tabular data

This is a notebook showcasing an example of an end-to-end ML lifecycle for tabular data.

MLflowsckikit-learnrandom foresthyperparameter search

Databricks: 8.0.x

MLflow Quick Start with R

Learn how to use MLflow for ML tracking in R.

rMLflow

Apache Spark™

The distributed computing engine that powers data engineering and data science for the data lakehouse

Databricks: 8.1.x

Streaming applications for sensor data

Learn how to use Structured Streaming in Spark for sensor data applications.

SQLPySparkStructured Streaming:
構造化ストリーミング

Databricks: 8.1.x

Analysis of the San Franciso fire calls with Spark

Use Spark ETL to analyze the calls to the San Francisco Fire Department.

ETLedaPySpark

Databricks: 8.1.x

Interacting with External data sources from Spark

A brief introduction on how to access and interact with external data sources from Spark.

SQLudfSparkScala

Databricks: 8.1.x

Structured Streaming for real-time applications

An introduction to the semantics of Structured Streaming in Spark for real-time data.

SQLPySparkStructured Streaming:
構造化ストリーミング

Databricks: 8.0.x – not CE

Extend SparkR with user-defined functions (UDFs)

Learn how to extend the capabilities of SparkR through custom functions written using UDFs in R.

udfSparkRdistributed computing

Databricks: 8.1.x

Adaptive query execution

Illustrate adaptive query execution (AQE) in Spark 3.0.

SQLPySparkadaptive query execution (aqe)

ユースケース

Databricks is used across many industries, including finance, retail, technology, manufacturing and more

Databricks: 7.6.x

Market basket analysis for retail

This is a notebook showcasing how to perform market basket analysis for retail.

小売・消費財market basket analysis

Databricks: 7.6.x

Scaling finance time series with Spark

Use Spark to analyze financial time series data to identify market manipulation.

PySparktime series不正検出

Solution
Accelerators

Complete templates for using Databricks in five different industries

Explore solutions