A failure prediction pipeline using XGBoost with survival analysis and SHAP explanations, trained on 15K sensor readings from 50 machines with an 8% failure rate.
Five-page Streamlit application for machine health monitoring and maintenance planning
XGBoost on 15K sensor readings from 50 industrial machines
Synthetic sensor data (15K readings, 50 machines, 8% failure rate) with features capturing temperature, vibration, pressure, RPM, and power consumption. Rolling 24-hour aggregations and interaction features are computed per machine. Four classifiers (Logistic Regression, Random Forest, XGBoost, Gradient Boosting) are trained with 5-fold stratified cross-validation. A Weibull AFT survival model estimates remaining useful life per machine. SHAP TreeExplainer provides per-reading explanations. A cost-based threshold optimizer balances unplanned downtime ($15K/FN) against preventive maintenance ($1.5K/FP) to find the decision boundary that minimizes total maintenance cost.
Synthetic industrial sensor data built from realistic distributions modeled on common failure patterns in rotating equipment. Normal operation features steady-state temperature around 68C and low vibration, while pre-failure readings exhibit elevated temperatures (92C mean), increased vibration intensity, and pressure drops. Machine-level attributes (age, operating hours, maintenance history) add context. The 8% failure rate reflects typical industrial equipment failure incidence over a 7-day horizon.