21

Predictive maintenance for industrial equipment

A failure prediction pipeline using XGBoost with survival analysis and SHAP explanations, trained on 15K sensor readings from 50 machines with an 8% failure rate.

Predictive maintenance XGBoost Survival analysis SHAP Sensor data Cost optimization
0.94
AUC-ROC, best model (XGBoost)

Interactive dashboard

Five-page Streamlit application for machine health monitoring and maintenance planning

Machine health
  • Status indicators: green, yellow, red per machine
  • Failure probability scatter across all 50 machines
  • Sortable detail table with sensor readings
Failure timeline
  • Per-machine failure probability over time
  • Multi-select machine comparison
  • Decision threshold reference line
Sensor trends
  • Temperature, vibration, pressure, RPM charts
  • Power consumption over time
  • Per-machine sensor statistics
Maintenance scheduler
  • Adjustable downtime and maintenance costs
  • Cost-optimized threshold selection
  • Priority-ranked maintenance schedule
Feature importance
  • Global SHAP feature importance ranking
  • Model comparison table and ROC curves
  • Individual reading explanations with waterfall contributions
$ pip install -r requirements.txt && streamlit run app.py

Key results

XGBoost on 15K sensor readings from 50 industrial machines

0.94
AUC-ROC
91%
Recall (failures caught)
78%
Precision
0.76
PR-AUC

Methodology

Synthetic sensor data (15K readings, 50 machines, 8% failure rate) with features capturing temperature, vibration, pressure, RPM, and power consumption. Rolling 24-hour aggregations and interaction features are computed per machine. Four classifiers (Logistic Regression, Random Forest, XGBoost, Gradient Boosting) are trained with 5-fold stratified cross-validation. A Weibull AFT survival model estimates remaining useful life per machine. SHAP TreeExplainer provides per-reading explanations. A cost-based threshold optimizer balances unplanned downtime ($15K/FN) against preventive maintenance ($1.5K/FP) to find the decision boundary that minimizes total maintenance cost.

Sensor data
15K readings, 50 machines
Model training
LR, RF, XGB, GBM
Survival + SHAP
Weibull AFT, TreeExplainer
Cost optimization
$15K downtime vs $1.5K PM

How to run

$ git clone https://github.com/guydev42/predictive-maintenance.git $ cd calgary-data-portfolio/project_21_predictive_maintenance $ pip install -r requirements.txt $ python data/generate_data.py $ streamlit run app.py

Data source

Synthetic industrial sensor data built from realistic distributions modeled on common failure patterns in rotating equipment. Normal operation features steady-state temperature around 68C and low vibration, while pre-failure readings exhibit elevated temperatures (92C mean), increased vibration intensity, and pressure drops. Machine-level attributes (age, operating hours, maintenance history) add context. The 8% failure rate reflects typical industrial equipment failure incidence over a 7-day horizon.

Links