Digital Attribution Modeling Using Apache Spark

Download Slides

A digital attribution model determines how credit for online conversion is assigned to media touch points in the conversion path. It helps marketers understand the effectiveness of media touches and often serve as a foundation for online media optimization. Based on Spark, we built an attribution system with three modules: model training, attribution, and insight generation. The attribution model obtains its basic form from logistic regression, but with two types of parameters that address the time decay effects and attribution weights respectively. And therefore the logistic regression module in MLlib cannot fit this model directly. We developed a new modeling algorithm in Spark to address this problem. Statistical modeling and text processing techniques such as survival analysis, causal modeling, and tokenization are employed in the system. Takeaways: 1. Spark is the right choice for building large-scale attribution systems 2. It is possible to customize MLlib algorithms for special business needs.