Skip to main content

Apache Spark has rapidly emerged as the de facto standard for big data processing and data sciences across all industries. The use cases range from providing recommendations based on user behavior to analyzing millions of genomic sequences to accelerate drug innovation and development for personalized medicine.

Our engineers, including the team that started the Spark research project at UC Berkeley which later became Apache Spark, continue to drive Spark development to make these transformative use cases a reality. Through the Databricks Blog, they regularly highlight new Spark releases and features, provide technical tutorials on Spark components, in addition to sharing practical implementation tools and tips.

eBook Cover 1

This e-book, the first of a series, offers a collection of the most popular technical blog posts written by leading Spark contributors and members of the Spark PMC including Matei Zaharia, the creator of the Spark research project at UC Berkeley; Reynold Xin, Spark’s chief architect; Michael Armbrust, who is the architect behind Spark SQL; Xiangrui Meng and Joseph Bradley, the drivers of Spark MLlib; and Tathagata Das, the lead developer behind Spark Streaming, just to name a few.

These blog posts highlight many of the major developments designed to make Spark analytics simpler including:

  • Section 1: An Introduction to the Apache Spark APIs for Analytics
  • Section 2: Tips and Tricks in Data Import
  • Section 3: Real-World Case Studies of Spark Analytics with Databricks

eBook 1 blog screen 2

Included within this eBook are recently created Databricks notebooks in Python, Scala, SQL, R, and Markdown that will help you experiment and visualize with Apache Spark Analytics.  If you do not have access to Databricks, sign up for Databricks Community Edition for free!

Whether you are just getting started with Spark or are already a Spark power user, this e-book will arm you with the knowledge to be successful on your next Spark project.

Get the e-book here.

Try Databricks for free

Related posts

Make Your Oil and Gas Assets Smarter by Implementing Predictive Maintenance with Databricks

July 19, 2018 by Don Hillborn and Denny Lee in
How to build an end-to-end predictive data pipeline with Databricks Delta and Spark Streaming Maintaining assets such as compressors is an extremely complex...

Real-Time End-to-End Integration with Apache Kafka in Apache Spark’s Structured Streaming

April 4, 2017 by Sunil Sitaula in
View the Notebook in Databricks Community Edition Structured Streaming APIs enable building end-to-end streaming applications called continuous applications in a consistent, fault-tolerant manner...

The Quest for Hidden Treasure: An Apache Spark Connector for the Riak NoSQL database

August 11, 2016 by Pavel Hardak in
View this notebook in Databricks This is a guest blog from our friends at Basho. Pavel Hardak is a director of product management...
See all Product posts