Xiang is a data scientist on the Data Science algorithms team at Affirm, working on production models to make real-time underwriting decisions as well as model interpretability. He is also interested in topics like distributed machine learning and machine learning servicing.
Shapley algorithm is an interpretation algorithm that is well-recognized by both the industry and academia. However, given its exponential runtime complexity and existing implementations taking a very long time to generate feature contributions for a single instance, it has found limited practical use in the industry. In order to explain model predictions at scale, we implemented the Shapley IME algorithm in Spark. To our knowledge, this is the first spark implementation of the Shapley algorithm that scales to large datasets and can work with most ML model objects.