Spark + AI Summit 2020 features a number of pre-conference training workshops that include a mix of instruction and hands-on exercises to help you improve your Apache Spark™ and Data Engineering skills. Learn how to leverage Apache Spark as your unified analytics engine for building data pipelines and machine learning. Expand upon your data sciences skills by better understanding the machine learning lifecycle with MLflow or diving into a deep learning tutorial with Keras and TensorFlow.
The training workshops are offered as add-ons to the Conference Pass.
Students will need to bring their own laptops with Chrome or Firefox browser and access to *.databricks.com.
Role: Business Leader, Platform Administrator, SQL Analyst, Data Engineer, Data Scientist
Duration: Half Day
Discover how Unified Data Analytics solves some of the common business problems associated with Big Data. You’ll learn how to apply organizational best practices that will help your data teams work better together when they have a single source of truth.
Prerequisites:
Role: Business Leader, Platform Administrator, SQL Analyst, Data Engineer, Data Scientist
Duration: Half Day
This course describes the core features of Delta Lake. It covers how Delta Lake simplifies and optimizes data architecture and the engineering of data pipelines. Upon completion, participants will understand how Delta Lake brings reliability, performance, and lifecycle management to data lakes.
Prerequisites:
Role: Platform Administrator
Duration: Half Day
Learn how to manage a Databricks account at an organizational level. Participants will come away knowing how to manage users and groups, including provisioning, access control and workspace storage.
Prerequisites:
Role: Data Engineer, Data Scientist
Duration: Full Day
This hands-on course covers the fundamentals of Apache Spark(™) programming, providing the essential concepts and skills you’ll need to navigate the Spark documentation and immediately start programming. Using case studies, you’ll explore the core components of the DataFrame API. Students will read and write data to various sources, preprocess data by correcting schemas and parsing different data types, and apply a variety of DataFrame transformations and actions to answer business questions.
Prerequisites:
Role: SQL Analyst
Duration: Full Day
This hands-on course shows learners how to use Spark-SQL in the Databricks environment. You’ll learn how to read, transform and write data using the SQL extensions provided by Apache Spark in addition to a cursory introduction to key topics unique to working with a distributed system like Apache Spark vs traditional RDBMs.
Prerequisites:
Role: Data Engineer
Duration: Full Day
Take a deep dive into the processes of tuning Spark applications, developing best practices and avoiding many of the common pitfalls associated with developing Spark applications.
Prerequisites:
Role: Data Engineer
Duration: Full Day
Delta Lake is designed to overcome many problems associated with traditional data lake pipelines and enable ACID transactions on data lakes. This course explores tools and tricks you can use to transform your current data lake pipeline into a highly performant Delta Lake pipeline.
Prerequisites:
Role: Data Engineer
Duration: Full Day
Structured streaming is a highly efficient way to ingest data from a variety of sources. This hands-on course targets Data Engineers who want to process big data using Apache Spark™ Structured Streaming.
Prerequisites:
Role: Data Scientist
Duration: Full Day
This course focuses on Apache Spark’s machine learning APIs. Students will learn the core APIs for using Spark, SQL and other high-level data access tools, and Spark’s streaming capabilities. It is delivered as a mixture of lecture and hands-on labs.
Prerequisites:
Role: Data Scientist
Duration: Full Day
Taught entirely in Python, this course offers a thorough overview of deep learning and how to scale it with Apache Spark. Students will learn the fundamentals of neural networks and how to build distributed deep learning models on top of Spark. Includes hands-on training with Keras, TensorFlow, MLflow, and Horovod to build, tune and apply models.
Prerequisites:
Role: Data Scientist
Duration: Full Day
Have you ever wondered how computers beat humans in Atari games or the ancient game of Go? Are you tired of the shortcomings of supervised and unsupervised learning? If you answered yes to any of these questions, this course is for you. This course combines theoretical and hands-on aspects of Reinforcement Learning. Upon completion of this course student will be able to:
Prerequisites:
Role: Data Scientist
Duration: Full Day
In this hands-on course, data scientists and data engineers learn best practices for managing experiments, projects and models using MLflow. Students build a pipeline to log and deploy machine learning models.
Prerequisites:
Role: Data Scientist
Duration: Full Day
In this course students will learn how to apply machine learning techniques in a distributed environment using SparkR and sparklyr. Students will learn about the Spark architecture, Spark DataFrame APIs, build linear and tree-based models, and perform hyperparameter tuning and pipeline optimization. The class is a combination of lectures, demos and hands-on labs.
Prerequisite:
Role: Data Scientist
Duration: Half Day
This course will teach you how to do natural language processing at scale. You will apply libraries such as NLTK and Gensim in a distributed setting as well as SparkML/MLlib to solve classification, sentiment analysis, and text wrangling tasks. You will apply pre-trained word embeddings, identify when to lemmatize vs stem your tokens, and generate term-frequency-inverse-document-frequency (TFIDF) vectors for your dataset. You will also use dimensionality reduction techniques to visualize word embeddings with Tensorboard and apply basic vector arithmetic to embeddings. This course is intended for people who are new to NLP.
Prerequisite:
Role: SQL Analyst, Data Engineer, Data Scientist
Duration: Half Day
In this half-day course, you will learn how Databricks and Spark can help solve real-world problems you face when working with financial data. You’ll learn how to deal with dirty data and how to get started with Structured Streaming and Real-Time Fraud Detection. Students will also receive a longer take-home capstone exercise as bonus content to the class where they can apply all the concepts presented.
Prerequisite:
Role: SQL Analyst, Data Engineer, Data Scientist
Duration: Half Day
In this half-day course, you will learn how Databricks and Spark can help solve real-world problems you face when working with retail data. You’ll learn how to deal with dirty data, and get started with Structured Streaming and Dashboards. Students will also receive a longer take-home capstone exercise as bonus content to the class where they can apply all the concepts presented.
Prerequisite:
Role: SQL Analyst, Data Engineer, Data Scientist
Duration: Half Day
In this half-day course, you will learn how Databricks and Spark can help solve real-world problems you face when working with life sciences data. You’ll learn how to deal with dirty data, create dashboards and get started with MLLib. Students will also receive a longer take-home capstone exercise as bonus content to the class where you can test all the concepts presented.
Prerequisite:
Role: SQL Analyst, Data Engineer, Data Scientist
Duration: Half Day
In this half-day course, students will learn how Databricks and Spark can help solve real-world problems you face when working with manufacturing data. Students will learn how to deal with dirty data, optimize data sources and transformations, and to get started with Structured Streaming. Students will also receive a longer take-home capstone exercise as bonus content to the class where you can test all the concepts presented.
Prerequisite:
Role: Data Engineer, Data Scientist
Duration: Half Day
Role: Data Engineer, Data Scientist
Duration: Half Day
In this half-day course, learners will review fundamentals of Spark Architecture components and concepts, core components of the DataFrames and how to access and use documentation during the exam. Students will prepare to complete a series of multiple choice questions and coding challenges that demonstrate an understanding of Spark developer basics.
Prerequisite: None