by Dan Morris, Hector Leano and Steve Sobel
Check out the solution accelerator to download the notebooks referred throughout this blog.
When T-Mobile embraced the un-carrier label, they didn’t just kick off a marketing campaign; they fundamentally changed the dynamics in the US market for telecom. Previously, telecom had been a staid, utility-like industry with steady growth and subscribers locked into two-year contracts to cover a “free” handset with a phone plan. But three factors changed the nature of the business:
These rapidly changing dynamics have moved telco providers from being a utility to being a value-added services provider across multiple lines of business including broadband, security, cable, and streaming video services. This, along with increased competition from new entrants -- has accelerated communications service providers’ investment in personalized, frictionless customer experiences across all channels, at all times. Core to building these experiences is understanding where existing customers are in the subscription life cycle, and in particular identifying those most at risk for churn. Reducing churn continues to be one of the most strategic areas of focus for every provider and the goal of many churn rate initiatives is to predict customer life cycle events and find ways to extend the life cycle profitably.
Based on best practices from our work with the leading communication service providers, we’ve developed solution accelerators for common analytics and machine learning use cases to save weeks or months of development time for your data engineers and data scientists.
This solution accelerator complements our work doing customer lifetime value, attrition for subscription services, and profitable customer retention, but with a telco-specific lens.
Using sample telco datasets from IBM, and the Lifelines library, this solution accelerator will:
The contents of this solution accelerator are contained in Databricks notebooks that are linked to at the end of this post.
Survival analysis is a collection of statistical methods used to examine and predict the time until an event of interest occurs. This form of analysis originated in healthcare, with a focus on time to death. Since then, survival analysis has been successfully applied to use cases in virtually every industry around the globe.
In Telco specifically, use cases include:
In contrast to other methods that may seem similar on the surface, such as linear regression, survival analysis takes censoring into account. Censoring occurs when the start and/or end of a measured value is unknown. For example, suppose our historical data includes records for the two customers below. In the case of customer A, we know the precise duration of the subscription because the customer churned in December 2020. For customer B, we know that the contract started four months ago and is still active, but we do not know how much longer they will be a customer. This is an example of right censoring because we do not yet know the end date for the measured value. Right censoring is what we most commonly see with this form of analysis.
Customer | Subscription Start Date | Subscription End Date | Subscription Duration | Active Subscription Flag |
A | Feb 3, 2020 | Dec 2, 2020 | 10 months | 0 |
B | Nov 11, 2020 | - | 4 months | 1 |
As illustrated above, we could move forward with a duration of four months for customer B, but this would lead to underestimating survival time. This problem is alleviated when using survival analysis since censoring is taken into account.
After accounting for censoring, the key output of a survival analysis machine learning model is a survival probability curve. As shown below, a survival probability curve plots time on the x-axis and survival probability on the y-axis. Starting at 0 months, this chart can be interpreted as saying: the probability of a customer staying at least 0 months is 100%. This is represented by the point (0, 1.0). Likewise, moving down the survival curve to the median (34 months), showing that a customer has a 50% probability of surviving at least 34 months, given that they have survived 33 months. Note that this last clause, “given that…”, signifies that this is a conditional probability.
Visualizing survival probability curves is particularly helpful when building a model and/or analyzing a model for inference. In many cases, however, the end goal is to use the output of a survival analysis model as an input for another model. For example, in this solution accelerator, we use the output of a survival analysis model as an input for calculating customer lifetime value. We then build an application that provides visibility into the net present value for a given cohort of users throughout a three-year time horizon. This is powerful because it enables marketers to understand what the payback period will be for various new customer acquisition campaigns. Similarly, one could use the output of the survival analysis model we build in this solution accelerator to align marketing messages to where consumers are in their customer journey.
In practice, the reference architecture that enables these types of use cases in production resembles the following:
The goal of this solution accelerator is to help you leverage survival analysis for your own customer retention use case as quickly as possible. As such, this solution accelerator contains an in-depth review of commonly used methods: Kaplan-Meier, Cox Proportional Hazards, and Accelerated Failure Time. Get started today by importing this solution accelerator directly into your Databricks workspace. You can also view our on-demand webinar on Survival Analysis in Telecommunications.