Shparkley: Scaling Shapley with Apache Spark - Databricks

Shparkley: Scaling Shapley with Apache Spark

Shapley algorithm is an interpretation algorithm that is well-recognized by both the industry and academia. However, given its exponential runtime complexity and existing implementations taking a very long time to generate feature contributions for a single instance, it has found limited practical use in the industry. In order to explain model predictions at scale, we implemented the Shapley IME algorithm in Spark. To our knowledge, this is the first spark implementation of the Shapley algorithm that scales to large datasets and can work with most ML model objects.

« back
About Xiang Huang


Xiang is a data scientist on the Data Science algorithms team at Affirm, working on production models to make real-time underwriting decisions as well as model interpretability. He is also interested in topics like distributed machine learning and machine learning servicing.

Cristine Dewar
About Cristine Dewar


Cristine Dewar is a data scientist on the Data Science fraud team at Affirm. She is currently working on models to prevent fraud. Cristine is passionate about fair and explainable ML and using data science to improve lives.