14

Case study 14

Predicting customer churn before it happens

How we identified 85% of at-risk customers and projected $141K in annual savings

Telecom Classification SHAP XGBoost LightGBM

Every lost customer is $900 walking out the door

A telecom provider with 5,000 subscribers was losing roughly one in four customers every year. Each lost customer represented $900 in lifetime value — and the company had no way to predict who would leave next.

Their retention team was operating blind: blanket discount offers to everyone, wasting budget on customers who were never going to leave while missing the ones who were about to.

The question was simple: can we identify which customers are likely to churn in the next 30 days, so the retention team can intervene with the right offer at the right time?

What we had to work with

DimensionDetail
Customers5,000 subscribers
Features21 attributes per customer
Key signalsTenure, contract type, monthly charges, payment method, internet service, add-on services
Target variableChurn (Yes / No) — 25.7% positive rate
Data quality30 missing values in total_charges (0.6%), imputed via monthly charges times tenure

Key insight

Month-to-month fiber customers with electronic check payment churn at 3.3x the rate of two-year contract customers. This single segmentation insight is worth more than any model.

What we built from the raw data

tenure_group

Buckets customers into lifecycle stages: 0–12, 13–24, 25–48, 49–72 months. Churn behavior differs dramatically by stage — a 3-month customer and a 48-month customer are fundamentally different risk profiles.

charges_per_month_tenure

Monthly charges normalized by tenure length. Captures price sensitivity at different lifecycle stages — a new customer paying $95/month is far more likely to leave than a 5-year customer at the same rate.

has_support_services

Count of active support add-ons: online security, backup, device protection, tech support. Customers with more services are stickier — each additional service increases switching cost.

Four models, one winner

We trained and tuned four classifiers with 5-fold stratified cross-validation, optimizing for AUC-ROC — the metric that best captures a model's ability to rank customers by risk.

Logistic Regression
AUC 0.711
Fast, interpretable, misses complex interactions
Random Forest
AUC 0.699
Strong learner, but overfits on this dataset
XGBoost
AUC 0.711
Solid, matches logistic on AUC
LightGBM
AUC 0.713
Best overall — chosen for deployment

Why the model predicts what it predicts

A model is only useful if the business can trust it. SHAP values decompose every prediction into the contribution of each feature — making the black box transparent.

Global feature importance

Across all customers, the top five factors driving churn predictions:

1 Contract type (month-to-month)
2 Tenure (short)
3 Monthly charges (high)
4 Internet service (fiber optic)
5 Payment method (electronic check)

Individual prediction

Here is one specific customer the model flagged:

Customer #4721 78% churn probability
Gender Female
Tenure 3 months
Contract Month-to-month
Internet Fiber optic
Monthly charges $95.00
Payment Electronic check
Tech support None
Online security None

SHAP contribution breakdown

Contract (month-to-month) +0.22
Tenure (3 months) +0.18
No tech support +0.12
Electronic check payment +0.09
Monthly charges ($95) +0.08

Recommended action

Offer a 12-month contract at 15% discount with a free tech support bundle. Expected retention lift: 30%. Revenue preserved: $270 net CLV.

The bottom line

If we intervene on every customer the model flags above the optimal threshold:

Optimal threshold0.200
True churners caught219 / 257 (85.2%)
False alarms398 customers flagged who would not have churned
Intervention cost$50 per customer contacted
Revenue saved per retained churner$900 CLV × 30% retention rate = $270 net
Total intervention cost$30,850 (617 customers × $50)
Total revenue saved$59,130 (219 churners × $270)
Net savings$28,280 on test set
Projected annual savings$141,400

Decision matrix

The model pays for itself if it retains just 115 customers per year. At 85% recall, it catches 219 on the test set alone.

What I would do next

01
Deploy as a weekly batch scoring pipeline

Flag the top 100 at-risk customers every Monday morning. Push results directly into the CRM so the retention team can act immediately.

02
A/B test the retention offers

Which offer works best for which segment? Discount vs. service upgrade vs. contract incentive — test systematically, measure lift.

03
Add behavioral features

App usage patterns, call center contact frequency, payment history trends. These real-time signals could push recall above 90%.

04
Build a customer lifetime value model

Not all churners are equal. Prioritize retention spend on high-value customers where the ROI of intervention is highest.

05
Monitor model drift

As the customer base evolves, retrain quarterly and track AUC degradation. Set alerts when performance drops below the break-even threshold.

Stack
Python, scikit-learn, XGBoost, LightGBM, SHAP, pandas, matplotlib
Evaluation
5-fold stratified cross-validation, AUC-ROC as primary metric
Tuning
GridSearchCV across all four model families
Code
View on GitHub
Data
Synthetic dataset: 5,000 customers with realistic telecom feature distributions