To defend its mobile users against ad fraud, T-Mobile’s Marketing Solutions team needed to build a streamlined, scalable, end-to-end solution for the Marketing organization that could detect all types of advertising fraud. Using Apache Spark™, they developed custom models and a unique algorithm that adapts to various app environments and user behaviors — but this initial solution was complex and costly to scale. By moving to Databricks and a cloud-based platform, the results were remarkable. They had a massive reduction in code runtime, the ability to detect ad fraud in real-time using T-Mobile network data, and a user-friendly, customized dashboard for internal teams to investigate, analyze risk, and report fraud.
In 2019, advertising fraud cost digital advertisers an estimated $23 billion — and those losses are only getting bigger. T-Mobile needed a way to identify how and where advertising fraud impacts ad placements and campaigns. Digital ads are comprised of several moving pieces, with algorithms determining how much the ad will cost, who will see it, and what version someone will see — all in less than a second. Despite the speed, each moving part gives fraudsters an opportunity to take advantage via bot farming or domain spoofing.
To develop an in-house solution that could accurately and efficiently identify fraud at scale, they turned to T-Mobile’s vast network of devices and data. “We need to be able to see and track what’s going on in the billions of online ad transactions that come through our network,” Yatskowitz said. “We also need to use this data to develop a model that can continuously identify several different types of fraud, while scaling to handle 4–10 terabytes of data that T-Mobile collects every day.”
Due to the scale and performance required, T-Mobile’s data team initially attempted to use Apache Spark™ in an on-premises environment. However, they underestimated the complexities of building an end-to-end analytics infrastructure that could scale to meet their data needs. With this preliminary solution, they spent more time on DevOps and infrastructure than on the data.
The T-Mobile team decided to migrate to a cloud-based platform that could not only simplify operations at scale — while still leveraging the blazing speed of Spark for data processing — but also enable machine learning at scale.
T-Mobile’s data team decided to use Databricks to power its internal Advertising Fraud Detection solution. The shift to Azure Databricks eliminated the complexities of managing infrastructure and maintaining clusters. This enabled T-Mobile’s data team to enhance efficiency and cut code runtime across the board, so they could focus on refining model accuracy, developing a proprietary algorithm to optimize fraud identification, and empowering T-Mobile’s marketing organization to make smart, data-driven decisions about risk mitigation and ad placement.
“Databricks has greatly simplified our analytics workflow so we can more easily develop models that identify and score behavioral data to indicate the likelihood of advertising fraud,” explained Yatskowitz.
In order to use machine learning models to accurately identify ad fraud signals, T-Mobile’s data scientists leverage Databricks’ interactive workspace, so they can collaborate more efficiently and train models without infrastructure limits. And with MLflow, they are able to streamline the ML lifecycle through the automation of tasks, and monitoring and tuning of models for performance.
One model they developed runs an algorithm called Normalized Entropy. This formula shows expected behaviors based on network events and can, therefore, surface anomalies in those behaviors and provide scoring to indicate potentially fraudulent signals. This algorithm, combined with additional metrics to account for variations across apps, traffic levels and demographics, delivers insights to the T-Mobile marketing team through a streamlined UI for assessing the potential risk of advertising on different websites and applications.
Building the end-to-end Advertising Fraud Detection service took time and effort, but the outcome was worth the input. With Databricks as the foundation of their analytics and ML platform, they are able to explore all their data and accelerate their ability to innovate with AI.
“Databricks gives us the scalability and efficiency we need to build complex, end-to-end products using big data and machine learning. We were able to use PySpark to create code that runs significantly faster than it did previously, taking the total operation time from eight minutes to 23 seconds,” said Yatskowitz. “For a piece of code that runs every day, it’s getting far more hours of compute time over the course of the year — plus, we can track performance and risk anomaly detection without constant monitoring.”
Looking ahead, the T-Mobile team is confident they have the technology stack required to take full advantage of the massive volumes of data at their disposal to not only improve upon their ongoing efforts to thwart fraud and malicious behaviors, but also start to unlock new data-driven innovations that improve the T-Mobile experience among their customers.
Databricks gives us the scalability and efficiency we need to build complex, end-to-end products using big data and machine learning.”
– Eric Yatskowitz, Data Scientist, T-Mobile Marketing Solutions