Data without limits

8 Steps for a Developer to Learn Apache Spark with Delta Lake

Learn how Apache Spark and Delta Lake unify all your data — big data and business data — on one platform for BI and ML.

What’s holding you back from unlocking the full potential of your data? You need a platform that can process and hold all your data — both raw data and business data — and deliver it to all your downstream users for BI and ML.

Apache Spark™ 2.x is a monumental shift in ease of use, higher performance and smarter unification of APIs across Spark components. And for the data being processed, Delta Lake brings data reliability and performance to data lakes, with capabilities like ACID transactions, schema enforcement, DML commands, and time travel.

In this eBook, we offer a step-by-step guide to technical content and related assets that will lead you to learn Apache Spark and Delta Lake. Whether you’re getting started or you’re already an accomplished developer, these steps will let you explore the benefits of these open source projects.

Here are the topics we will cover:

  • Why Apache Spark and Delta Lake
  • Apache Spark and Delta Lake concepts, key terms and keywords
  • Advanced Apache Spark internals and core
  • DataFrames, Datasets and Spark SQL essentials
  • Graph processing with GraphFrames
  • Continuous applications with structured streaming
  • Machine learning for humans
  • Data reliability challenges for data lakes
  • Delta Lake for ACID transactions, schema enforcement and more
  • Unifying batch and streaming data pipelines

Get the eBook