Spark + AI Summit 2019 features a number of 1-day training workshops that include a mix of instruction and hands-on exercises to help you improve your Apache Spark skills.
Training is offered as an add-on to the Conference Pass.
Students will need to bring their own laptop with Chrome or Firefox Browsers and unfettered access to *.databricks.com.
The Data Science with Apache Spark workshop will show how to use Apache Spark to perform exploratory data analysis (EDA), develop machine learning pipelines, and use the APIs and algorithms available in the Spark MLlib DataFrames API. It is designed for software developers, data analysts, data engineers, and data scientists.
It will also cover parallelizing machine learning algorithms at a conceptual level. The workshop will take a pragmatic approach, with a focus on using Apache Spark for data analysis and building models using MLlib, while limiting the time spent on machine learning theory and the internal workings of Spark.
We will work through examples using public datasets that will show you how to apply Apache Spark to help you iterate faster and develop models on massive datasets. This workshop will provide you with tools to be productive using Spark on practical data analysis tasks and machine learning problems. After completing this workshop you should be comfortable using DataFrames, the DataFrames MLlib API, and related documentation. These building blocks will enable you to use Apache Spark to solve a variety of data analysis and machine learning tasks.
Some experience coding in Python or Scala and a basic understanding of data science topics and terminology are recommended. Experience using Spark and familiarity with the concept of a DataFrame is helpful.
Brief conceptual reviews of data science techniques will be performed before the techniques are used. Labs and demos will be available in both Python and Scala.
This course is aimed at the practitioning data scientist who is eager to get started with deep learning, as well as software engineers and technical managers interested in a thorough, hands-on overview of deep learning and its integration with Apache Spark.
The course covers the fundamentals of neural networks and how to build distributed deep learning models on top of Spark. Throughout the class, you will use Keras, TensorFlow, MLflow, and Horovod to build, tune and apply models. This course is taught entirely in Python.
Each topic includes lecture content along with hands-on labs in the Databricks notebook environment.
This 1-day course is for data engineers, analysts, architects, dev-ops, and team-leads interested in troubleshooting and optimizing Apache Spark applications. It covers troubleshooting, tuning, best practices, anti-patterns to avoid, and other measures to help tune and troubleshoot Spark applications and queries.
Each topic includes lecture content along with hands-on use of Spark through an elegant web-based notebook environment. Inspired by tools like IPython/Jupyter, notebooks allow attendees to code jobs, data analysis queries, and visualizations using their own Spark cluster, accessed through a web browser. Students may keep the notebooks and continue to use them with the free Databricks Community Edition offering; all examples are guaranteed to run in that environment. Alternatively, each notebook can be exported as source code and run within any Spark environment.
This 1-day course is for data engineers, analysts, architects, data scientist, software engineers, IT operations, and technical managers interested in a brief hands-on overview of Apache Spark.
The course provides an introduction to the Spark architecture, some of the core APIs for using Spark, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs. The class is a mixture of lecture and hands-on labs.
Each topic includes lecture content along with hands-on labs in the Databricks notebook environment. Students may keep the notebooks and continue to use them with the free Databricks Community Edition offering after the class ends; all examples are guaranteed to run in that environment.
BUILDING DATA PIPELINES FOR APACHE SPARK™ WITH DELTA LAKE
Delta is Databricks’ next-gen engine built on top of Apache Spark. This course is for data engineers, architects, data scientists and software engineers who want to use Databricks Delta for building pipelines for data lakes with high data reliability and performance. The course will cover typical data reliability and performance challenges that data lakes face and teach how to address them using Delta. The course ends with a capstone project building a complete data pipeline using Databricks Delta.
TOPICS COVERED INCLUDE
In this course, data scientists and engineers learn best practices for putting machine-learning models into production. It starts with managing experiments, projects, and models using MLflow, then explores various deployment options, including batch predictions, Spark Streaming, and REST APIs. Finally, it covers monitoring machine-learning models once they have been deployed into production.
Databricks Certified Developer for Apache Spark 2.x will not be available after Oct 31, 2019, and subsequently will NOT be offered to candidates at Spark Summit. This exam is superseded by the Databricks Certified Associate for Apache Spark 2.4.
Databricks Certified Associate for Apache Spark 2.4 validates your knowledge of the core components of the DataFrames API and also validates that you have a rudimentary knowledge of the Spark Architecture. For more information, see the Databricks Certified Associate for Apache Spark 2.4
NOTE: This certification is not affiliated with the Apache Software Foundation.
A testing room will be available all three days of Spark Summit. Availability will be adjusted to accommodate other events at Spark Summit with more information to come.
This half-day lecture is for anyone seeking to learn more about the different certifications offered by Databricks including the Databricks Certified Associate for Apache Spark 2.4 and our upcoming exams.
It includes test-taking strategies, sample questions, preparation guidelines, and exam requirements for all certifications. The primary goal of this course is to help potential applicants understand the breadth and depth of knowledge on which individuals will be tested and to provide guidelines on how to prepare for the exam.
Attendees who select the prep course will have the option to take either exam after the course is completed.
This is not a programming course.
Please Note: Attending the certification prep course should NOT, by itself, be considered sufficient preparation for any certification exam offered by Databricks.