Mayank Bansal is currently working as a Staff engineer at Uber in data infrastructure team. He is co-author of Peloton. He is Apache Hadoop Committer and Oozie PMC and Committer. Previously he was working at ebay in hadoop platform team leading YARN and MapReduce effort. Prior to that he was working at Yahoo and worked on Oozie.
Zeus is an efficient, highly scalable and distributed shuffle as a service which is powering all Data processing (Spark and Hive) at Uber. Uber runs one of the largest Spark and Hive clusters on top of YARN in industry which leads to many issues such as hardware failures (Burn out Disks), reliability and scalability challenges. Zeus is built ground up to support hundreds of thousands of jobs and millions of containers which shuffles petabytes of shuffle data. Zeus has changed the paradigm of current external shuffle which resulted in far better performance for shuffle. Although the shuffle data is getting written Remote however the performance is better or the same for most of the Jobs. In this talk we'll take a deep dive into the Zeus architecture and describe how it's deployed at Uber. We will then describe how it's integrated to run shuffle for Spark, and contrast it with Spark's built-in sort-based shuffle mechanism . We will also contrast Zeus performance numbers with different storage systems backed by external shuffle e.g. NFS and HDFS. We will also talk about future roadmap and plans for Zeus.