Programming by Examples - Databricks

Programming by Examples

Download Slides

Programming by examples (PBE) is a new frontier in AI that enables users to create scripts from input-output examples. PBE can provide a 10-100x productivity increase for developers in some task domains. 99% of computer users are non-programmers and PBE can enable them to create small scripts to automate repetitive tasks. PBE is revolutionizing data wrangling. Data scientists spend up to 80% time transforming data into a form suitable for machine learning (ML). PBE enables automation of many data manipulation tasks like string transformations (e.g., converting “FirstName LastName” to “LastName, FirstName”), column splitting, field extraction from log files/web pages, normalizing semi-structured spreadsheet into structured tables. Such PBE capabilities have been released inside multiple Microsoft products including Excel, Powershell, OMS, and Azure ML workbench.The synthesized scripts are quite performant and AML Workbench even enables their execution on large data-sets using SPARK runtime.

Another killer application of PBE is around repetitive code transformations like formatting or refactoring, given that developers spend up to 40% time refactoring code in an application migration scenario. A key technical challenge in PBE is to search for programs in an underlying domain-specific language that are consistent with the user-provided examples. Our real-time search methodology leverages logical reasoning techniques and neural-guided heuristics.

Another challenge is to resolve the ambiguity in examples since many programs can satisfy few examples. Our ML-based ranking techniques often select an intended program from among the many that satisfy the examples. We also leverage active-learning-based user interaction models that facilitate a bot-like conversation with the user. Microsoft PROSE SDK exposes these generic search and ranking algorithms (non-commercial use), allowing advanced developers to construct PBE capabilities for new task domains.

This presentation will educate the audience about this new PBE-based programming paradigm: its applications, form factors inside different products, the science behind it.

Session hashtag: #Res8SAIS

About Sumit Gulwani

Sumit is a Research manager at Microsoft, leading the PROSE research and engineering team that develops APIs for program synthesis (programming by examples and natural language) and incorporates them into real products. He is the inventor of the popular Flash Fill feature in Microsoft Excel used by hundreds of millions of people. He has published 110+ peer-reviewed papers in top-tier conferences/journals across multiple computer science areas. He is a recipient of the prestigious ACM SIGPLAN Robin Milner Young Researcher Award, ACM SIGPLAN Outstanding Doctoral Dissertation Award, and the President's Gold Medal from IIT Kanpur.