Due to Spark, writing big data applications has never been easier…at least until they stop being easy! At Lightbend we’ve helped our customers out of a number of hidden Spark pitfalls. Some crop up often; the ever-persistent OutOfMemoryError, the confusing NoSuchMethodError, shuffle and partition management, etc. Others occur less frequently; an obscure configuration affecting SQL broadcasts, struggles with speculating, a failing stream recovery due to RDD joins, S3 file reading leading to hangs, etc. All are intriguing! In this session we will provide insights into their origins and show how you can avoid making the same mistakes. Whether you are a seasoned Spark developer or a novice, you should learn some new tips and tricks that could save you hours or even days of debugging.
Justin is a software journeyman, continuously learning and honing his skills. He is currently using his knowledge to provide developer support at Lightbend. As much as he loves to learn, he also loves to spread his knowledge through teaching and helping others. He has authored three online courses for Pluralsight, including a Spark Fundamentals one, is one of the top Spark answerers on StackOverflow, and organizes the Pittsburgh Scala meetups.