A crucial, but not-so-much-interesting part of developing Spark applications is packaging them for the deployment. Although there have been many successful notebook-style interactive environments such as Apache Zeppelin and Databricks, for any serious projects with a number of dependencies it is often the case where we build a Spark application as a separate Scala project in a version-controlled source tree. For this, the official documentation tells us to use SBT or Maven’s assembly plugin to make a fat jar and run the spark-submit script, but this process significantly slows down the development cycle of coding, ‘assembly’-ing, configuring, debugging, and deploying. This presentation will introduce CueSheet, an open-source framework built around Apache Spark that accelerates the development of Spark applications. Originally developed at Kakao Corp., CueSheet removes the need of even opening the terminal by taking care of packaging, submitting and deploying Spark applications. It thus enables developers to focus only on the important parts – programming and debugging. Without any special configuration, CueSheet makes it possible to not only launch Spark applications directly from the IDE but also debug by stepping through breakpoints. These advantages, which are made possible by a number of JVM tricks that we will briefly go over, have tremendously boosted the productivity of developing Spark applications in a number of teams at Kakao. The presentation will also discuss how it helps to nicely separate the configuration a Spark application from its code in a reusable manner.
Jong Wook Kim is currently a doctoral student studying Music Technology at New York University, focusing on music recommendation and music information retrieval. Previously he worked on building real-time recommender systems at Kakao, a Korean internet company serving more than 40 million monthly active users. Prior to Kakao he was an MMORPG server programmer at NCSOFT. He has an MS in computer science and engineering from University of Michigan, and a BS in electrical engineering from Korea Advanced Institute of Science and Technology. He is active in open source communities and has contributed to Apache S2Graph.