Data + AI Summit 2020 training will be held on 17 November, with an expanded curriculum of half and full day classes. These training classes will include both lecture and hands-on exercises. Apache Spark™ 3.x certification is also offered as an exam.
Role: Data Scientist, Data Engineer
Duration: Morning
This course covers the new features and changes introduced to Apache Spark and the surrounding ecosystem during the past 12 months. It focuses on Spark 2.4 and 3.0, updates to performance, monitoring, usability, stability, extensibility, PySpark, SparkR, Delta Lakes, Pandas, and MLFlow. Students will also learn about backwards compatibility with 2.x and the considerations required for updating to Spark 3.0.
Prerequisites:
Role: Business Leader, Platform Administrator, SQL Analyst, Data Engineer, Data Scientist
Duration: Half day
In this half-day course, students will build an end-to-end batch OLAP data pipeline using Delta Lake. Best practices for design and use of Delta Lake will be discussed and applied throughout.
Prerequisites:
Role: Data Engineer
Duration: Full day
In this full-day course, students will learn about the Lakehouse data architecture concept, and will build an end-to-end OLAP data pipeline using Delta Lake with streaming data, learning and applying best practices throughout.
Prerequisites:
Role: Data Engineer
Duration: Full day
This full-day course aims to deepen the knowledge of key “problem” areas in Apache Spark, how to mitigate those problems, and even explores new features in Spark 3 that further help to push the envelope in terms of application performance.
Prerequisites:
Role: Data Engineer, Data Scientist
Duration: Afternoon
This half-day course focuses on teaching distributed machine learning with Spark. Students will build and evaluate pipelines with MLlib, understand the differences between single node and distributed ML, and optimise hyperparameter tuning at scale. This class is taught concurrently in Python and Scala.
Prerequisites:
Role: Data Scientist
Duration: Afternoon
This course offers a thorough overview of how to scale training and deployment of neural networks with Apache Spark. We guide students through building deep learning models with TensorFlow, perform distributed inference with Spark UDFs via MLflow, and train a distributed model across a cluster using Horovod. This course is taught entirely in Python.
Prerequisite:
Role: Data Scientist,Data Engineer
Duration: Full
In this hands-on course, data scientists and data engineers learn best practices in machine learning operations. Students will learn to manage the machine learning lifecycle using MLflow and deploy and monitor machine learning solutions in batch, streaming, and real-time using REST while avoiding common production issues. By the end of this course, students will have built, deployed and monitored a complete machine learning pipeline all from within Databricks. This course is taught entirely in Python.
Prerequisite:
Role: Data Scientist,Data Engineer
Duration: Morning
In this half-day course, you will learn how Databricks and Spark can help solve real-world problems one faces when working with financial, retail and manufacturing data. You’ll learn how to deal with dirty data and how to get started with Structured Streaming and real-time analytics. Students will also receive a longer take-home capstone exercise as bonus content to the class where they can apply all the concepts presented. This class is taught concurrently in Python and Scala.
Prerequisite:
Duration: Morning
In this half-day course prepared specifically for Databricks partners, attendees will learn how to perform fine-grained time series forecasting at scale with Facebook Prophet and Apache Spark. The course will begin with an explanation of fine-grained time series and its benefits for retail organisations and will dive into the nuances of performing fine-grained time series forecasting on Databricks. By the end of this course, attendees will leave with techniques that enable them to produce more efficient/accurate time series forecasts at a finer-scale than using traditional techniques/methods.
Prerequisite:
Duration: Afternoon
In this half-day course prepared specifically for Databricks partners, attendees will learn how to modernise traditional value-at-risk (VaR) calculation through the use of various components of the Databricks Unified Data Analytics Platform, including Delta Lake, Apache Spark and MLflow. The course will begin with an explanation of modern VaR calculation and its benefits for financial services organisations and will dive into the nuances of performing modernised VaR calculations on Databricks. By the end of this course, attendees will leave with techniques that enable them to produce more efficient/accurate VaR calculations than using traditional techniques/methods.
Prerequisite:
Role: Data Engineer
Duration: Full Day
In this full-day course prepared specifically for Databricks partners, attendees will learn best practices to plan and implement a migration from Hadoop to Databricks. By highlighting the difference between on-prem Hadoop and Databricks cloud-native Unified Data Analytics Platform, this class prepares students to avoid common translation pitfalls and ensure that systems will be performant at scale. Students will also receive a skills-based capstone project.
Prerequisite:
Role: Data Engineer
Duration: Full Day
In this full-day course prepared specifically for Databricks partners, attendees will learn information that will help them when generating proof of concepts (POC) for potential Databricks customers. This training is an accelerated version of the traditional week-long Partner Flight School and will include:
Prerequisite:
Role: SQL analyst
Duration: 90 Min
In this 90 minute course for SQL analysts, we will introduce a new data management architecture, the Lakehouse, which allows you to achieve high performance while querying directly on your data lake. We will use a SQL analytics tool to query data and create dashboards. First, we’ll practice writing and visualizing queries in a guided lesson. Then, you’ll have time to create your own dashboard complete with parameterized queries and automated alerts.