Skip to main content

Screenshot of the Databricks product documentation homepage.

We are proud to announce the launch of a new online guide for Databricks and Apache Spark at docs.databricks.com. Our goal is to create a definitive resource for Databricks users and the most comprehensive set of Apache Spark documentation on the web. As a result, we've dedicated a large portion of the guide to Spark tutorials and How-Tos, in addition to Databricks product documentation.

The content of the guide falls into three broad categories:

In this blog, I will provide an overview of this new resource and highlight a few key sections.

Documentation on How to Use Databricks

We’ve consolidated product documentation into simple tutorials that walk you through everything you need to write and run Spark code in Databricks. This includes how to spin up a cluster, analyze data in notebooks, run production jobs with notebooks or JARs, Databricks APIs, and much more. There are detailed walkthroughs of every aspect of the Databricks UI as well as introductory tutorial videos.

How to use Databricks videos

In particular, the FAQ and best practices section has many tips to help new users get the most out of Databricks such as:

  • Using an IDE with Databricks
  • Integrating Databricks with Tableau
  • Using XGBoost and Spark

Documentation on How to Administer Databricks

These documents walk through how to manage the configuration, administration, and other housekeeping aspects of a Databricks account. For example, account owners can learn how to change AWS credentials and view billing details, while account administrators can learn how to add new Databricks users, set up access control, and configure SAML 2.0 compatible identity service providers.

Screenshot of the Databricks Administration Guide homepage

Guide for Apache Spark Developers

While improving the documentation for the Databricks product was essential, we also wanted to give back to the Apache Spark community. We have created Apache Spark’s only Spark SQL Language Manual, simple examples for Apache Spark UDFs in Scala and Python, SparkR Function Reference, as well as Structured Streaming Examples.

Screenshot of a code sample in the Databricks documentation featuring a one-click copy to clipboard function.

There are many practical code examples throughout the guide and you can easily drop them into your environment to test them out with a simple “copy” button.

Accessing the Guide in Databricks

To make it easy for users to find and use the code they need, we have also revamped the in-product search to make it easy to use and reference the code examples that you need. For example, finding an example for a Python Spark UDF is just a couple of keystrokes away!

https://www.youtube.com/watch?v=nSRX_9SbEBM

What’s Next

This release has been a couple of months in the making and we are just getting started. We will be constantly adding new content to answer the most common questions as well as deep dives into more sophisticated use cases. Be sure to browse the docs, bookmark it, and share with friends because this resource will only continue to grow!

Try Databricks for free

Related posts

Spark Summit East 2017: Another Record-Setting Spark Summit

February 9, 2017 by Jules Damji, Wayne Chan and Dave Wang in
We’ve put together a short recap of the keynotes and highlights from Databricks’ speakers for Apache Spark enthusiasts who could not attend the...

Using Structured Streaming with Delta Sharing in Unity Catalog

We are excited to announce that support for using Structured Streaming with Delta Sharing is now generally available (GA) in Azure, AWS, and...

How to build a Quality of Service (QoS) analytics solution for streaming video services

Click on the following link to view and download the QoS notebooks discussed below in this article. Contents The Importance of Quality to...
See all Company Blog posts