eBook

Unlock the potential of your data

Learn how Apache Spark™ and Delta Lake unify all your data — big data and business data — on one platform for BI and ML.

Apache Spark 3.x is a monumental shift in ease of use, higher performance and smarter unification of APIs across Spark components. And for the data being processed, Delta Lake brings data reliability and performance to data lakes, with capabilities like ACID transactions, schema enforcement, DML commands and time travel.

In this eBook, we offer a step-by-step guide to technical content and related assets that will lead you to learn Apache Spark and Delta Lake. Whether you’re just getting started or you’re already an accomplished developer, explore the benefits of these open source projects.

Here are the 8 steps we’ll cover:

  1. Why Apache Spark and Delta Lake
  2. Apache Spark concepts, key terms and keywords
  3. Advanced Apache Spark internals and core
  4. DataFrames, data sets and Spark SQL essentials
  5. Graph processing with GraphFrames
  6. Continuous applications with structured streaming
  7. Machine learning for humans
  8. Reliable data lakes and data pipelines