Gang is currently a software engineer in the Hadoop Compute team at Uber. He builds services and tools to support large scale data applications. He is co-creator of Spark Uber Development Kit (UDK) which provides APIs and tools for engineers to develop and run Spark jobs easily and efficiently. Before Uber, Gang obtained a Master Degree in Computer Science from Carnegie Mellon University.
At Uber, the major workload of our Hadoop clusters are running on Spark. Some examples are applications in mappings, frauds, machine learning and data science. To make creation and management of Spark jobs easy, we create Spark Uber Development Kit (UDK). It is a set of tools (including logs debugger, performance reporter, resource auditing, etc) and APIs (for job monitoring, message logging, result dispersal, etc). UDK helps engineers debug, monitor and optimize Spark jobs easily. It also helps reduce the time of creating and running Spark jobs from weeks to days. At Uber, engineers have been using UDK in several Spark clusters of size ranging from a few hundreds to thousand machines. In this talk, we discuss Spark UDK and share our experience with the open source community.