We are proud to introduce the Getting Started with Apache Spark on Databricks Guide. This step-by-step guide illustrates how to leverage the Databricks’ platform to work with Apache Spark. Our just-in-time data platform simplifies common challenges when working with Spark: data integration, real-time experimentation, and robust deployment of production applications.
Databricks provides a simple, just-in-time data platform designed for data analysts, data scientists, and engineers. Using Databricks, this step-by-step guide helps you solve real-world Data Sciences and Data Engineering scenarios with Apache Spark. It will help you familiarize yourself with the Spark UI, learn how to create Spark jobs, load data and work with Datasets, get familiar with Spark’s DataFrames and Datasets API, run machine learning algorithms, and understand the basic concepts behind Spark Streaming.
Instead of worrying about spinning up clusters, maintaining clusters, tracking code history, or upgrading to new Spark versions, you can start writing Spark queries instantly and focus on your data problems.
The guide helps you get started with Apache Spark and Databricks in six easy steps. It will first provide a quick start on how to use open source Apache Spark and then leverage this knowledge to learn how to use Spark DataFrames with Spark SQL. In time for Spark 2.0, we also will discuss how to use Datasets and how DataFrames and Datasets are now unified. The guide also has quick starts for Machine Learning and Streaming so you can easily apply them to your data problems. Each of these modules refers to standalone notebooks and datasets so you can jump ahead if you feel comfortable:
- Quick Start: Quick Start into Apache Spark using Python or Scala
- Datasets: Examining IoT Device Using Datasets
- DataFrames: Analyzing City Population vs. Median Home Sale Price using DataFrames
- Machine Learning: Performing Linear Regression on City Population vs. Median Home Sale Price
- Streaming: Jump Start into Spark Streaming Performing a Streaming Wordcount
- What’s Next: Additional resources to learn more about Apache Spark
We hope you enjoy the Getting Started with Apache Spark on Databricks Guide and we will continue updating it with new notebooks and samples as Apache Spark grows.