eBook

Build fast, reliable data pipelines

Get started with Apache Spark™

For data engineers looking to leverage the immense growth of Apache SparkTM and Delta Lake to build faster and more reliable data pipelines, Databricks is happy to provide “The Data Engineer’s Guide to Apache Spark and Delta Lake.” This eBook features excerpts from the larger ““Definitive Guide to Apache Spark” and the “Delta Lake Quick Start.””

Download this eBook to:

  • Walk through the core architecture of a cluster, Spark application and Spark’s Structured APIs using DataFrames and SQL
  • Get a tour of Spark’s toolset that developers use for different tasks, from graph analysis and machine learning to streaming and integrations
  • Understand working with different data, including Boolean, numbers, strings, dates and timestamps, handling Null, complex types and user-defined functions
  • Learn how to get more reliable and higher-quality data with Delta Lake, including loading, updating and rolling back data in your data lake