Spark is providing a way to make big data applications easier to work with, but understanding how to actually deploy the platform can be quite confusing. This talk will present operational tips and best practices based on supporting our (Databricks) customers with Spark in production. We will discuss how your choice of storage and overall pipeline design influence performance. We will review Spark’s configuration subsystem and discuss which configuration properties are relevant to you. We’ll also review common misconfigurations that prevent users from getting the most of their Spark deployment. Finally, I’ll discuss frequently encountered issues working with customer environments and present debugging techniques to get to the root cause. This talk should help answer the following questions: How should I deploy my Spark application (cluster size, storage format, etc)? How can I improve the performance of my Spark application? What’s causing my Spark application to crash?