トレーニング


Register

Filter by role

Learning paths

Register

Data + AI Summit 2020 training will be held on 17 November, with an expanded curriculum of half and full day classes. These training classes will include both lecture and hands-on exercises. Apache Spark™ 3.x certification is also offered as an exam.

What's New in Apache Spark 3.0?
Free for all registered attendees

Role: Data Scientist, Data Engineer
Duration: Morning

This course covers the new features and changes introduced to Apache Spark and the surrounding ecosystem during the past 12 months. It focuses on Spark 2.4 and 3.0, updates to performance, monitoring, usability, stability, extensibility, PySpark, SparkR, Delta Lakes, Pandas, and MLFlow. Students will also learn about backwards compatibility with 2.x and the considerations required for updating to Spark 3.0.

Prerequisites:

  • Familiarity with Apache Spark 2.x

Delta Lake Hands-on
Free for all registered attendees

Role: Business Leader, Platform Administrator, SQL Analyst, Data Engineer, Data Scientist
Duration: Half day

In this half-day course, students will build an end-to-end batch OLAP data pipeline using Delta Lake. Best practices for design and use of Delta Lake will be discussed and applied throughout.

Prerequisites:

  • Intermediate programming skills in Python or Scala
  • Intermediate SQL skills
  • Beginner experience using the Spark DataFrames API
  • Beginner knowledge of general data engineering concepts

Data Engineering with Delta Lake

Role: Data Engineer
Duration: Full day

In this full-day course, students will learn about the Lakehouse data architecture concept, and will build an end-to-end OLAP data pipeline using Delta Lake with streaming data, learning and applying best practices throughout.

Prerequisites:

  • Intermediate programming skills in Python or Scala
  • Intermediate SQL skills
  • Beginner experience using the Spark DataFrames API
  • Beginner knowledge of general data engineering concepts
  • Beginner knowledge of the core features and use cases of Delta Lake

Performance Tuning on Apache Spark

Role: Data Engineer
Duration: Full day

This full-day course aims to deepen the knowledge of key “problem” areas in Apache Spark, how to mitigate those problems, and even explores new features in Spark 3 that further help to push the envelope in terms of application performance.

Prerequisites:

  • Intermediate experience with Python
  • Beginning experience with the PySpark DataFrame API (or have taken the Apache Spark Programming with Databricks class)
  • Working knowledge of machine learning and data science

Scalable Machine Learning with Apache Spark

Role: Data Engineer, Data Scientist
Duration: Afternoon

This half-day course focuses on teaching distributed machine learning with Spark. Students will build and evaluate pipelines with MLlib, understand the differences between single node and distributed ML, and optimise hyperparameter tuning at scale. This class is taught concurrently in Python and Scala.

Prerequisites:

  • Intermediate experience with Python
  • Beginning experience with the PySpark DataFrame API (or have taken the Apache Spark Programming with Databricks class)
  • Working knowledge of machine learning and data science

Scalable Deep Learning with TensorFlow and Apache Spark

Role: Data Scientist
Duration: Afternoon

This course offers a thorough overview of how to scale training and deployment of neural networks with Apache Spark. We guide students through building deep learning models with TensorFlow, perform distributed inference with Spark UDFs via MLflow, and train a distributed model across a cluster using Horovod. This course is taught entirely in Python.

Prerequisite:

  • Experience programming in Python and PySpark
  • Basic understanding of Machine Learning concepts
  • Prior experience with Keras/TensorFlow highly encouraged

Machine Learning in Production: MLflow and Model Deployment

Role: Data Scientist,Data Engineer
Duration: Full

In this hands-on course, data scientists and data engineers learn best practices in machine learning operations. Students will learn to manage the machine learning lifecycle using MLflow and deploy and monitor machine learning solutions in batch, streaming, and real-time using REST while avoiding common production issues. By the end of this course, students will have built, deployed and monitored a complete machine learning pipeline all from within Databricks. This course is taught entirely in Python.

Prerequisite:

  • Experience programming in Python
  • Working knowledge of ML concepts

Practical Data Engineering in Industry: Data Pipelines with Apache Spark

Role: Data Scientist,Data Engineer
Duration: Morning

In this half-day course, you will learn how Databricks and Spark can help solve real-world problems one faces when working with financial, retail and manufacturing data. You’ll learn how to deal with dirty data and how to get started with Structured Streaming and real-time analytics. Students will also receive a longer take-home capstone exercise as bonus content to the class where they can apply all the concepts presented. This class is taught concurrently in Python and Scala.

Prerequisite:

  • Beginner to intermediate experience with the DataFrames API
  • Intermediate to advanced programming experience in Python or Scala

Generating Reliable Inventory Forecasts with Databricks
Available for Databricks Partners

Duration: Morning

In this half-day course prepared specifically for Databricks partners, attendees will learn how to perform fine-grained time series forecasting at scale with Facebook Prophet and Apache Spark. The course will begin with an explanation of fine-grained time series and its benefits for retail organisations and will dive into the nuances of performing fine-grained time series forecasting on Databricks. By the end of this course, attendees will leave with techniques that enable them to produce more efficient/accurate time series forecasts at a finer-scale than using traditional techniques/methods.

Prerequisite:

  • Intermediate knowledge of data science and machine learning concepts.
  • Intermediate domain knowledge about time series forecasting/analysis as it pertains to the retail industry.
  • Intermediate experience using Databricks for data science and machine learning workflows.

Modernizing Financial Risk Management with Databricks
Available for Databricks Partners

Duration: Afternoon

In this half-day course prepared specifically for Databricks partners, attendees will learn how to modernise traditional value-at-risk (VaR) calculation through the use of various components of the Databricks Unified Data Analytics Platform, including Delta Lake, Apache Spark and MLflow. The course will begin with an explanation of modern VaR calculation and its benefits for financial services organisations and will dive into the nuances of performing modernised VaR calculations on Databricks. By the end of this course, attendees will leave with techniques that enable them to produce more efficient/accurate VaR calculations than using traditional techniques/methods.

Prerequisite:

  • Intermediate knowledge of data science and machine learning concepts.
  • Intermediate domain knowledge about time series forecasting/analysis as it pertains to the financial industry.
  • Intermediate experience using Databricks for data science and machine learning workflows.

Migrating from Hadoop to Databricks
Available for Databricks Partners

Role: Data Engineer
Duration: Full Day

In this full-day course prepared specifically for Databricks partners, attendees will learn best practices to plan and implement a migration from Hadoop to Databricks. By highlighting the difference between on-prem Hadoop and Databricks cloud-native Unified Data Analytics Platform, this class prepares students to avoid common translation pitfalls and ensure that systems will be performant at scale. Students will also receive a skills-based capstone project.

Prerequisite:

  • Intermediate experience with Python/Scala programming
  • Intermediate experience using SQL
  • Intermediate knowledge of Spark programming
  • Intermediate knowledge of data engineering concepts

Architecting Databricks (Accelerated Partner Flight School)
Available for Databricks Partners

Role: Data Engineer
Duration: Full Day

In this full-day course prepared specifically for Databricks partners, attendees will learn information that will help them when generating proof of concepts (POC) for potential Databricks customers. This training is an accelerated version of the traditional week-long Partner Flight School and will include:

  • Technical presentations on Databricks functionality
  • Presentations on best practices/resources for selling and positioning Databricks

Prerequisite:

  • Intermediate experience with Python/Scala programming
  • Intermediate experience using SQL
  • Intermediate knowledge of Spark programming
  • Intermediate knowledge of data engineering concepts
  • Experience in a Technical Sales Role

Introduction to doing SQL analytics on Lakehouse architecture

Role: SQL analyst
Duration: 90 Min

In this 90 minute course for SQL analysts, we will introduce a new data management architecture, the Lakehouse, which allows you to achieve high performance while querying directly on your data lake. We will use a SQL analytics tool to query data and create dashboards. First, we’ll practice writing and visualizing queries in a guided lesson. Then, you’ll have time to create your own dashboard complete with parameterized queries and automated alerts.