Dr. Jeffrey Yau

Head of Data Science, Walmart Labs

Jeffrey is the Head of Data Science at the Store Technology Group of Walmart U.S. Technology. His prior roles include the Chief Data Scientist / Global Head of Data Science at AllianceBernstein, a global asset-management firm that managed over $500 billions in assets, Vice President of Data Science at Silicon Valley Data Science, Head of Risk Analytics and Quantitative Modeling at Charles Schwab Corporation, and Director of Risk Consulting at KPMG. He has also taught economics, econometrics, finance, statistics, and machine learning at UC Berkeley, Cornell, NYU, University of Pennsylvania, and Virginia Tech. Jeffrey is active in the data science community and often speaks at data science conferences in the U.S., Europe, and Asia. He has many years of experience in applying a wide range of econometric and machine learning techniques to create analytic solutions for financial institutions, various businesses, and policy institutions. Jeffrey holds a Ph.D. and an M.A. in Economics from the University of Pennsylvania and a B.S. in Mathematics and Economics from UCLA.

UPCOMING SESSIONS

Building Time Series Forecasting Models using Neural Network and Statistical ModelsSummit 2020

Time series forecasting is both a fascinating subject to study and an important technique frequently applied in industry, government, and academic settings. Example applications include demand forecasting, inventory planning, marketing strategy planning, capital budgeting, pricing, machine predictive maintenance, and macroeconomic forecasting, just to name a few. Forecasting typically requires time series data, which is ubiquitous nowadays, both within and outside of the data science field, such as weekly initial unemployment claims, tick-level stock prices, weekly company sales, daily number of steps taken recorded by a wearable, machine performance measurements recorded by sensors, and key performance indicators of business functions.

Wrangling, analyzing, and systematically modeling time series data for forecasting require a different set of techniques due to the temporal dependence nature of time series. In this tutorial, we will discuss some of the most fundamental concepts and techniques to build and deploy time series forecasting and is designed for data scientists who are at the beginning of their journey of analyzing time series data and producing time series forecasts, covering the key differences between time series data and cross-sectional data, manipulation of time series, exploratory time series data analysis using statistics (and their graphical representations), and some of the most important classes of statistical and machine learning time series models. The concepts and techniques discussed in this tutorial form the part of the foundation for learning more advanced time series methods. Specific time series models discussed in this tutorial include Autoregressive-type models and Recurrent Neural Network. The former is one of the most important class of time series statistical models widely applied in both industry and academics in the last several decades while the latter is a neural network architecture that is suitable for time series forecasting and has gained its popularity in recent years.

What you’ll learn:

  • The key characteristics of time series data
  • Statistics for summarizing time series
  • Graphical techniques to describe the characteristics of time series
  • Essential concepts and techniques required to appropriately apply the Autoregressive-type and Neural Network models in practice, such as
    • Mathematical formulation
    • Statistical assumptions of the AR-type models
    • Data structure required for implementing various types of Recurrent Neural Network in Keras
    • Implementation of these models in Python (and related libraries) using simulated and real-world time-series data
    • Model evaluation
    • Model selection
    • Producing forecasts
    • The advantages and disadvantages of these models when applying them in practice

    Prerequisites

    To be benefited from this tutorial, attendees should:

    • Have working knowledge of Python
    • Have working knowledge of machine learning and classical linear regression model
    • Bring a laptop with the following installed:
      • Anaconda Python 3.7 version: https://www.anaconda.com/distribution/
      • statsmodels https://www.statsmodels.org/stable/index.html
      • Keras: https://keras.io/

Building Time Series Forecasting Models using Neural Network and Statistical ModelsSummit 2020

Time series forecasting is both a fascinating subject to study and an important technique frequently applied in industry, government, and academic settings. Example applications include demand forecasting, inventory planning, marketing strategy planning, capital budgeting, pricing, machine predictive maintenance, and macroeconomic forecasting, just to name a few. Forecasting typically requires time series data, which is ubiquitous nowadays, both within and outside of the data science field, such as weekly initial unemployment claims, tick-level stock prices, weekly company sales, daily number of steps taken recorded by a wearable, machine performance measurements recorded by sensors, and key performance indicators of business functions.

Wrangling, analyzing, and systematically modeling time series data for forecasting require a different set of techniques due to the temporal dependence nature of time series. In this tutorial, we will discuss some of the most fundamental concepts and techniques to build and deploy time series forecasting and is designed for data scientists who are at the beginning of their journey of analyzing time series data and producing time series forecasts, covering the key differences between time series data and cross-sectional data, manipulation of time series, exploratory time series data analysis using statistics (and their graphical representations), and some of the most important classes of statistical and machine learning time series models. The concepts and techniques discussed in this tutorial form the part of the foundation for learning more advanced time series methods. Specific time series models discussed in this tutorial include Autoregressive-type models and Recurrent Neural Network. The former is one of the most important class of time series statistical models widely applied in both industry and academics in the last several decades while the latter is a neural network architecture that is suitable for time series forecasting and has gained its popularity in recent years.

What you’ll learn:

  • The key characteristics of time series data
  • Statistics for summarizing time series
  • Graphical techniques to describe the characteristics of time series
  • Essential concepts and techniques required to appropriately apply the Autoregressive-type and Neural Network models in practice, such as:
    • Mathematical formulation
    • Statistical assumptions of the AR-type models
    • Data structure required for implementing various types of Recurrent Neural Network in Keras
    • Implementation of these models in Python (and related libraries) using simulated and real-world time-series data
    • Model evaluation
    • Model selection
    • Producing forecasts
    • The advantages and disadvantages of these models when applying them in practice

Prerequisites

To be benefited from this tutorial, attendees should:

  • Have working knowledge of Python
  • Have working knowledge of machine learning and classical linear regression model
  • Bring a laptop with the following installed:
    • Anaconda Python 3.7 version: https://www.anaconda.com/distribution/
    • statsmodels https://www.statsmodels.org/stable/index.html
    • Keras: https://keras.io/

PAST SESSIONS

Time Series Forecasting Using Recurrent Neural Network and Vector Autoregressive Model: When and HowSummit 2018

Given the resurgence of neural network-based techniques in recent years, it is important for data science practitioner to understand how to apply these techniques and the tradeoffs between neural network-based and traditional statistical methods. This lecture discusses two specific techniques: Vector Autoregressive (VAR) Models and Recurrent Neural Network (RNN). The former is one of the most important class of multivariate time series statistical models applied in finance while the latter is a neural network architecture that is suitable for time series forecasting. I'll demonstrate how they are implemented in practice and compares their advantages and disadvantages. Real-world applications, demonstrated using python and Spark, are used to illustrate these techniques. While not the focus in this lecture, exploratory time series data analysis using time-series plot, plots of autocorrelation (i.e. correlogram), plots of partial autocorrelation, plots of cross-correlations, histogram, and kernel density plot, will also be included in the demo. The attendees will learn - the formulation of a time series forecasting problem statement in context of VAR and RNN - the application of Recurrent Neural Network-based techniques in time series forecasting - the application of Vector Autoregressive Models in multivariate time series forecasting - the pros and cons of using VAR and RNN-based techniques in the context of financial time series forecasting - When to use VAR and when to use RNN-based techniques Session hashtag: #SAISDL4