Vishal is a software engineer at Microsoft’s AI Development Acceleration Program (MAIDAP). He works with teams across the company to build and productize machine learning solutions. Currently, he is working with Azure Cognitive Services to explore the next generation of natural language understanding techniques in question-answering systems. Previously, he has worked with Office 365 and Azure Gray Systems Lab in domains including time series forecasting, unsupervised clustering and natural language processing. He graduated from Rutgers University with an undergrad in computer science. Beyond work, he is involved in mentoring high school kids for their startup ideas and promoting AI literacy for all.
May 26, 2021 03:50 PM PT
A key benefit of serverless computing is that resources can be allocated on demand, but the quantity of resources to request, and allocate, for a job can profoundly impact its running time and cost. For a job that has not yet run, how can we provide users with an estimate of how the job’s performance changes with provisioned resources, so that users can make an informed choice upfront about cost-performance tradeoffs?
This talk will describe several related research efforts at Microsoft to address this question. We focus on optimizing the amount of computational resources that control a data analytics query’s achieved intra-parallelism. These use machine learning models on query characteristics to predict the run time or Performance Characteristic Curve (PCC) as a function of the maximum parallelism that the query will be allowed to exploit.
The AutoToken project uses models to predict the peak number of tokens (resource units) that is determined by the maximum parallelism that the recurring SCOPE job can ever exploit while running in Cosmos, an Exascale Big Data analytics platform at Microsoft. AutoToken_vNext, or TASQ, predicts the PCC as a function of the number of allocated tokens (limited parallelism). The AutoExecutor project uses models to predict the PCC for Apache Spark SQL queries as a function of the number of executors. The AutoDOP project uses models to predict the run time for SQL Server analytics queries, running on a single machine, as a function of their maximum allowed Degree Of Parallelism (DOP).
We will present our approaches and prediction results for these scenarios, discuss some common challenges that we handled, and outline some open research questions in this space.