Project 09
Understanding which factors drive business longevity across 22,000+ Calgary licence records using survival analysis and classification to predict outcomes and recommend locations
Streamlit application
Four pages covering survival curves by business type, closure risk factors, community-based location recommendations, and an open data explorer.
Survival curves
Kaplan-Meier curves segmented by business type showing survival probability over time
Risk factors
Cox proportional-hazards coefficients revealing which attributes drive closure risk
Location recommender
Composite scoring by community factoring survival rate, competition density, and business diversity
Explorer
Browse and filter the full licence dataset by type, community, status, and registration year
Key results
Cox C-index
0.68
Concordance index from Cox proportional-hazards model
AUC-ROC
0.86
XGBoost classifier for survived-vs-closed prediction
Businesses
22K
Business licence records analyzed from Calgary Open Data
Methodology
Fetched business licence and civic census data from Calgary Open Data. Computed Kaplan-Meier survival curves segmented by business type. Fitted a Cox proportional-hazards model to identify closure risk factors. Trained Random Forest and XGBoost classifiers, then built a composite location scoring function weighting survival (45%), competition (30%), and diversity (25%).