Testing Apache Spark—Avoiding the Fail Boat Beyond RDDs

As Spark continues to evolve, we need to revisit our testing techniques to support Datasets, streaming, and more. This talk expands on “Beyond Parallelize and Collect” (not required to have been seen) to discuss how to create large scale test jobs while supporting Spark’s latest features. We will explore the difficulties with testing Streaming Programs, options for setting up integration testing, beyond just local mode, with Spark, and also examine best practices for acceptance tests.
Session hashtag: #EUeco4

« back