How to Write Batch or Streaming Data Pipelines with Apache Beam in 15 mins - Databricks

How to Write Batch or Streaming Data Pipelines with Apache Beam in 15 mins

Apache Beam is an open source model and set of tools which help you create batch and streaming data-parallel processing pipelines. These pipelines can be written in Java or Python SDKs and run on one of the many Apache Beam pipeline runners, including the Apache Spark runner. This talk will provide an overview and demo of creating pipelines in Apache Beam and executing those pipelines on Apache Spark.

Learn more:

  • Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
  • Databricks’ Data Pipelines: Journey And Lessons Learned