Testing Spark: Best Practices - Databricks

Testing Spark: Best Practices

Download Slides

What are the challenges with testing the unique processing features of Spark and Spark Streaming applications? As with any growing technology with added/upgraded features for Spark, it is crucial for your team to ensure the quality and reliability of your Spark based applications. At Ooyala, we are working on batch and streaming pipelines setup using Spark. This requires test strategies that make us confident to deploy these services to production.

We have been working on automating various unit and integration level tests for Spark-based batch and streaming mode applications. As part of this effort, we worked on simulating cluster-like conditions and building utilities to feed data in real time for streaming applications. Today, we would like to share some of the challenges, test setup requirements, test strategies, potential solutions and best practices that we learned in the process of testing our spark applications.

About Anupama Shetty

Anupama Shetty is a Software Development Engineer in Test for Ooyala’s Analytics team. She has worked on big data processing platforms such as Hadoop, Kafka, Storm and Spark. She has built automation test frameworks for video players, API data verification and Spark applications such as Job Server and Streaming. She holds a Masters degree in Software Engineering from San Jose State University.