Daniel Kang

Research Assistant, Stanford University

Daniel Kang did his bachelors and M.Eng at MIT, with his thesis in computational biology. He then went on to do another masters at the University of Cambridge, studying statistics and probability thanks to the generosity of a Churchill scholarship. After a fateful encounter with Professors Peter Bailis and Matei Zaharia, he’s now slaving away in the Stanford DAWN lab as a PhD student. Currently, his research focuses on deploying (unreliable) machine learning models efficiently and with guarantees. His research has primarily focused on video analytics and autonomous vehicles, but he’s willing to change his mind for food.

Past sessions

Summit Europe 2020 Efficient Query Processing Using Machine Learning

November 18, 2020 04:00 PM PT

Given the rise of deep neural networks (DNNs), unstructured data is becoming increasingly feasible to query by using these DNNs to extract structured data from this unstructured data. For example, an object detection DNN can extract object types and positions from images and BERT DNNs can extract relations from text. Unfortunately, these DNNs can be extremely expensive for many applications, costing up to hundreds of thousands of dollars for naive methods of analysis.

In this talk, I'll describe the TASTI system from the Stanford DAWN lab, which we have developed to reduce the cost of queries over unstructured data. We'll first describe how to use proxy scores, which are cheap approximations of expensive DNNs, to accelerate a range of queries (including aggregation, selection, and limit queries, which we explored in the BlazeIt and SUPG systems). We'll then describe how to generate these scores by clustering unstructured data records in a theoretical principled manner. Combined, our techniques can accelerate queries over unstructured data over 100x compared to naive methods of executing queries.

We'll also describe our ongoing work to apply TASTI to real world applications, including ecological analysis in collaboration with Stanford biologists and detecting wildfires.

This work is based on four publications at VLDB (BlazeIt, SUPG, Smol, TASTI) joint with Professors Peter Bailis and Matei Zaharia. Our code is open-sourced.

Speaker: Daniel Kang