Financial Planning and Analysis teams often rely on spreadsheets for building data products that provide senior management with analysis and information that is crucial in decision-making. But spreadsheets do not scale and when it comes to expanding models FP&A analysts quickly hit a glass ceiling. The code that FP&A analysts write is in the form of Spreadsheet formulas.
In this talk I will show how Spreadsheet formulas and data can be automatically processed at scale inside a Spark cluster by the driver and worker nodes. Essentially this means running a Spreadsheet at scale inside your Spark cluster. I will show how Spreadsheets and their calculated outputs can be transformed into Data Frames for further processing with Spark.
We will also discuss next steps in FP&A data pipelines including AutoML and use of such pipelines for Data Science. The broader research topic is in the field of Model-Driven Data Product Design & Development which should be of interest to Spark Summit attendees who are looking for use cases and new opportunities to leverage existing corporate assets like Spreadsheets to automatically build working software that adds tremendous value at scale.
Session hashtag: #SAISEco2
Oscar studied Computer Science at Delft University of Technology. He’s now Data Scientist at Xoom a PayPal service. Oscar is interested in Data Management, Dataset Search, Online Learning to Rank, and Apache Spark.