Apache Spark is a gift to the big data community, which adds tons of new features on every release. However, it’s difficult to manage petabyte-scale Hadoop clusters with hundreds of edge nodes, multiple Spark releases and demonstrate operational efficiencies and standardization. In order to address these challenges, Paypal has developed and deployed a REST0based Spark platform: Spark Compute as a Service (SCaaS),which provides improved application development, execution, logging, security, workload management and tuning. This session will walk through the top challenges faced by PayPal administrators, developers and operations and describe how Paypal’s SCaaS platform overcomes them by leveraging open source tools and technologies, like Livy, Jupyter, SparkMagic, Zeppelin, SQL Tools, Kafka and Elastic. You’ll also hear about the improvements PayPal has added, which enable it to run greater than 10,000 Spark applications in production effectively. Session hashtag: #SFexp17
Prabhu Kasinathan is the chief data engineer in Big Data Platform at Paypal with 5+ years of big data experience. He is creating APIs, tools and services for Spark platform to support multi-tenancy and large scale computation-intensive applications. He is an expert in building data warehousing solutions on Hadoop and Teradata platform with 11+ years of data experience.