In this course data engineers access data where it lives and then apply data extraction best practices, including schemas, corrupt record handling, and parallelized code. By the end of this course, you will extract data from multiple sources, use schema inference and apply user-defined schemas, and navigate Databricks and Apache Spark™ documents to source solutions.
2-4 hours, 75% hands-on
The course is a series of seven self-paced lessons available in both Scala and Python. A final capstone project involves writing an end-to-end ETL job that loads semi-structured JSON data into a relational model. Each lesson includes hands-on exercises.
Supported platforms include Databricks Community Edition, Azure Databricks and Amazon.
During this course you: