Project 01
Construction stakeholders in Calgary need reliable cost estimates early in the planning process. This project uses 484K+ historical building permits and machine learning to predict project costs from permit characteristics, location, and scope.
Interactive app
Dataset summary, distributions, and key statistics across 484K building permits
Cost distributions, community comparisons, and temporal trends with interactive charts
Compare Ridge, Random Forest, Gradient Boosting, and XGBoost on key regression metrics
Community-level aggregates and permit characteristics driving cost predictions
Performance
Approach
Fetched 484,000+ permits from the Calgary Open Data API via Socrata. Cleaned and log-transformed the heavily right-skewed cost distribution. Engineered community-level aggregates (average and median cost, permit counts) plus temporal features. Trained and compared Ridge, Random Forest, Gradient Boosting, and XGBoost regressors, selecting the best performer by R-squared and MAE.