Ganesh is a Senior Member of Technical Staff at Salesforce.com, where he has been working on the Force.com platform.
Overview : Force.com is a platform as a service (PaaS) that allows customers to develop custom applications that integrate into the main Salesforce.com application. As part of the production readiness testing of the platform, we test for regression in customers' applications in Production. In this talk, we explain how we leverage Spark to help us with this large scale effort. Extended Description : Salesforce customers have written hundreds of millions (and growing) number of automated tests for their custom applications. As part of our pre-release regression testing, we run these tests twice - once with the old platform code and once with the new platform code. These test runs should have identical behavior if the new platform code has to pass the regression testing and be ready for production. Our application stack is setup to run these hundreds of millions of tests, store their results, analyze, find and fix bugs - all in a few weeks. We leverage Spark jobs to do filtering and perform various transformations on the results. Once cleaned up, the results then go through a unsupervised learning pipeline written in Spark MLlib to cluster results to identify real issues vs non-issues and known issues. Key takeaways: Spark MLlib, Spark at scale, business use case, improve operational efficiency.