Project 01

Building permit cost predictor

Construction stakeholders in Calgary need reliable cost estimates early in the planning process. This project uses 484K+ historical building permits and machine learning to predict project costs from permit characteristics, location, and scope.

XGBoost Regression Feature engineering Calgary open data 484K permits
0.89 R-squared (XGBoost)

Streamlit dashboard pages

Overview

Dataset summary, distributions, and key statistics across 484K building permits

Exploratory analysis

Cost distributions, community comparisons, and temporal trends with interactive charts

03

Model performance

Compare Ridge, Random Forest, Gradient Boosting, and XGBoost on key regression metrics

04

Feature importance

Community-level aggregates and permit characteristics driving cost predictions

Key results

0.89
R-squared
XGBoost regressor
~$30K
Mean absolute error
On log-transformed costs
484K+
Permits analyzed
Calgary open data API

Methodology

Fetched 484,000+ permits from the Calgary Open Data API via Socrata. Cleaned and log-transformed the heavily right-skewed cost distribution. Engineered community-level aggregates (average and median cost, permit counts) plus temporal features. Trained and compared Ridge, Random Forest, Gradient Boosting, and XGBoost regressors, selecting the best performer by R-squared and MAE.

01 Fetch 484K permits via Socrata API
02 Clean and log-transform costs
03 Engineer community aggregates
04 Train four regression models
05 Evaluate and select XGBoost
06 Deploy Streamlit dashboard