Case study 14

Predicting customer churn before it happens

How we identified 85% of at-risk customers and projected $141K in annual savings

Telecom Classification SHAP XGBoost LightGBM

01 — The problem

Every lost customer is $900 walking out the door

A telecom provider with 5,000 subscribers was losing roughly one in four customers every year. Each lost customer represented $900 in lifetime value — and the company had no way to predict who would leave next.

Their retention team was operating blind: blanket discount offers to everyone, wasting budget on customers who were never going to leave while missing the ones who were about to.

The question was simple: can we identify which customers are likely to churn in the next 30 days, so the retention team can intervene with the right offer at the right time?

02 — The data

What we had to work with

Dimension	Detail
Customers	5,000 subscribers
Features	21 attributes per customer
Key signals	Tenure, contract type, monthly charges, payment method, internet service, add-on services
Target variable	Churn (Yes / No) — 25.7% positive rate
Data quality	30 missing values in total_charges (0.6%), imputed via monthly charges times tenure

Key insight

Month-to-month fiber customers with electronic check payment churn at 3.3x the rate of two-year contract customers. This single segmentation insight is worth more than any model.

03 — Feature engineering

What we built from the raw data

tenure_group

Buckets customers into lifecycle stages: 0–12, 13–24, 25–48, 49–72 months. Churn behavior differs dramatically by stage — a 3-month customer and a 48-month customer are fundamentally different risk profiles.

charges_per_month_tenure

Monthly charges normalized by tenure length. Captures price sensitivity at different lifecycle stages — a new customer paying $95/month is far more likely to leave than a 5-year customer at the same rate.

has_support_services

Count of active support add-ons: online security, backup, device protection, tech support. Customers with more services are stickier — each additional service increases switching cost.

04 — The model

Four models, one winner

We trained and tuned four classifiers with 5-fold stratified cross-validation, optimizing for AUC-ROC — the metric that best captures a model's ability to rank customers by risk.

Logistic Regression

AUC 0.711

Fast, interpretable, misses complex interactions

Random Forest

AUC 0.699

Strong learner, but overfits on this dataset

XGBoost

AUC 0.711

Solid, matches logistic on AUC

LightGBM

AUC 0.713

Best overall — chosen for deployment

05 — Explainability

Why the model predicts what it predicts

A model is only useful if the business can trust it. SHAP values decompose every prediction into the contribution of each feature — making the black box transparent.

Global feature importance

Across all customers, the top five factors driving churn predictions:

1 Contract type (month-to-month)

2 Tenure (short)

3 Monthly charges (high)

4 Internet service (fiber optic)

5 Payment method (electronic check)

Individual prediction

Here is one specific customer the model flagged:

06 — Business impact

The bottom line

If we intervene on every customer the model flags above the optimal threshold:

Optimal threshold	0.200
True churners caught	219 / 257 (85.2%)
False alarms	398 customers flagged who would not have churned
Intervention cost	$50 per customer contacted
Revenue saved per retained churner	$900 CLV × 30% retention rate = $270 net
Total intervention cost	$30,850 (617 customers × $50)
Total revenue saved	$59,130 (219 churners × $270)
Net savings	$28,280 on test set
Projected annual savings	$141,400

Decision matrix

The model pays for itself if it retains just 115 customers per year. At 85% recall, it catches 219 on the test set alone.

07 — Next steps

What I would do next

Deploy as a weekly batch scoring pipeline

Flag the top 100 at-risk customers every Monday morning. Push results directly into the CRM so the retention team can act immediately.

A/B test the retention offers

Which offer works best for which segment? Discount vs. service upgrade vs. contract incentive — test systematically, measure lift.

Add behavioral features

App usage patterns, call center contact frequency, payment history trends. These real-time signals could push recall above 90%.

Build a customer lifetime value model

Not all churners are equal. Prioritize retention spend on high-value customers where the ROI of intervention is highest.

Monitor model drift

As the customer base evolves, retrain quarterly and track AUC degradation. Set alerts when performance drops below the break-even threshold.

Stack: Python, scikit-learn, XGBoost, LightGBM, SHAP, pandas, matplotlib
Evaluation: 5-fold stratified cross-validation, AUC-ROC as primary metric
Tuning: GridSearchCV across all four model families
Code: View on GitHub
Data: Synthetic dataset: 5,000 customers with realistic telecom feature distributions