SparkSheet—Transforming Spreadsheets into Spark DataFrames - Databricks

SparkSheet—Transforming Spreadsheets into Spark DataFrames

Download Slides

Business analysts often rely on spreadsheets for building data products. But spreadsheets do not scale and when it comes to computation end-user programmers quickly hit a glass ceiling. In this talk I will show how spreadsheet formulas can be automatically transformed to Spark DataFrames with the use of program transformation tools and techniques. Transforming spreadsheet formulas is useful because it enables functional programs prototyped in a spreadsheet (as discussed in [1]) to be automatically transformed into Scala programs that leverage the Spark DataFrames API. Using the spreadsheet grammar reported in [2] we are benchmarking SparkSheet — our program transformation pipeline — against two large data sets. Our next step is to build ML pipelines on top of SparkSheet. In this talk I will give attendees a sneak peak into results achieved so far and our upcoming publications in the research fields of Program Transformation and End User Programming. [1] London-2016/Felienne-Hermans-on-Functional-Excel-and-Graphical- Languages [2] A Grammar for Spreadsheet Formulas Evaluated on Two Large Datasets – Efthimia Aivaloglou, David Hoepelman & Felienne Hermans, Proceedings of SCAM ’15

About Oscar Castañeda-Villagrán

Oscar studied Computer Science at Delft University of Technology. He's now a Data Scientist at Xoom a PayPal service and a researcher for Universidad del Valle de Guatemala. Oscar is interested in Dataset Search, Learning to Rank, and Apache Spark and is a proponent of Model-Driven Data Product Design & Development.