Skip to main content

We are proud to introduce the Getting Started with Apache Spark on Databricks Guide. This step-by-step guide illustrates how to leverage the Databricks’ platform to work with Apache Spark. Our just-in-time data platform simplifies common challenges when working with Spark: data integration, real-time experimentation, and robust deployment of production applications.

Databricks provides a simple, just-in-time data platform designed for data analysts, data scientists, and engineers. Using Databricks, this step-by-step guide helps you solve real-world Data Sciences and Data Engineering scenarios with Apache Spark. It will help you familiarize yourself with the Spark UI, learn how to create Spark jobs, load data and work with Datasets, get familiar with Spark’s DataFrames and Datasets API, run machine learning algorithms, and understand the basic concepts behind Spark Streaming.

Instead of worrying about spinning up clusters, maintaining clusters, tracking code history, or upgrading to new Spark versions, you can start writing Spark queries instantly and focus on your data problems.

Visit the Getting Started with Apache Spark on Databricks Guide

The guide helps you get started with Apache Spark and Databricks in six easy steps. It will first provide a quick start on how to use open source Apache Spark and then leverage this knowledge to learn how to use Spark DataFrames with Spark SQL. In time for Spark 2.0, we also will discuss how to use Datasets and how DataFrames and Datasets are now unified. The guide also has quick starts for Machine Learning and Streaming so you can easily apply them to your data problems. Each of these modules refers to standalone notebooks and datasets so you can jump ahead if you feel comfortable:

We hope you enjoy the Getting Started with Apache Spark on Databricks Guide and we will continue updating it with new notebooks and samples as Apache Spark grows.

Try Databricks for free

Related posts

Detecting Financial Fraud at Scale with Decision Trees and MLflow on Databricks

Try this notebook in Databricks Detecting fraudulent patterns at scale using artificial intelligence is a challenge, no matter the use case. The massive...

Diving into Apache Spark Streaming's Execution Model

With so many distributed stream processing engines available, people often ask us about the unique benefits of Apache Spark Streaming . From early...

Databricks and Apache Spark™ 2017 Year in Review

January 3, 2018 by Jules Damji in
At Databricks we welcome the dawn of the New Year 2018 by reflecting on what we achieved collectively as a company and community...
See all Company Blog posts