Apache Spark is a dynamic execution engine that can take relatively simple Scala code and create complex and optimized execution plans. In this talk, we will describe how user code translates into Spark drivers, executors, stages, tasks, transformations, and shuffles. We will also discuss various sources of information on how Spark applications use hardware resources, and show how application developers can use this information to write more efficient code. We will show how Pepperdata’s products can clearly identify such usages and tie them to specific lines of code. We will show how Spark application owners can quickly identify the root causes of such common problems as job slowdowns, inadequate memory configuration, and Java garbage collection issues.
Michael has a PhD in Optimization and Decision Science from the University of Pennsylvania with a focus on constrained resource allocation problems. Michael leads the Data Science and Engineering initiatives at Cadent, a leading provider of media, advertising technology and data solutions for the pay-TV industry. He has also taught Convex Optimization at UPenn. He has been a practicing data driven business architect since 2005, working on various subcontracts during his undergraduate and graduate work.
Stefan in his current role as a Data Engineer at Cadent, focuses on Big Data computational platform solutions like Spark, that enables Cadent to leverage the Data Science and Machine Learning tasks for achieving faster and better business results. Previous to Cadent, Stefan was an Application Developer at QVC, where he worked on building logistic and warehouse software solutions for the retail industry. He’s also spent time as a SQL Developer at CCP, Senior Software Analyst at EXE Technologies, and an IT Consultant at UNISYS. Stefan received his PhD in Computer Science at the Bulgarian Academy of Sciences, where he also served as an Assistant Professor.