Navdeep is a Hacker Scientist at H2O.ai. He graduated from California State University, East Bay with a M.S. degree in Computational Statistics, B.S. in Statistics, and a B.A. in Psychology (minor in Mathematics). During his education he gained interests in machine learning, time series analysis, statistical computing, data mining, & data visualization.
Previous to H2O.ai he worked at a couple start ups and Cisco Systems, Inc. focusing on data science, software development, and marketing research. Before that, he was a consultant at FICO working with small to mid level banks in the U.S. & South America focusing on risk management.
Prediction by machine learning models is fundamentally the execution of a computer program. In this case, the rules of the computer program are learned by the computer itself from training data instead of being programmed by a human. Like all good programs, machine learning models should be debugged to discover and remediate errors. When the debugging process increases accuracy in holdout data, increases transparency into model mechanisms, decreases or identifies hackable attack surfaces, or decreases disparate impact, this debugging process also enhances trust and interpretability in model mechanisms and predictions. Navdeep Gill identifies several standard techniques in the context of model debugging disparate impact, residual, and sensitivity analysis and introduces novel applications such as global and local explanation of model residuals.
The rsparkling R package is an extension package for sparklyr (an R interface for Apache Spark) that creates an R front-end for the Sparkling Water Spark package from H2O. This provides an interface to H2O's high performance, distributed machine learning algorithms on Spark, using R. The main purpose of this package is to provide a connector between sparklyr and H2O's machine learning algorithms. In this session, Gill will introduce the basic architectures of rsparkling, H2O Sparkling Water and sparklyr, and go over how these frameworks work together to build a cohesive machine learning framework. In addition, you'll learn about various implementations for using rsparkling in production. The session will conclude with a live demo of rsparkling that will display an end-to-end use case of data ingestion, munging and machine learning. Session hashtag: #SFdev15