Project 07

Development permit approval predictor

Applying for a development permit in Calgary involves significant time, cost, and uncertainty. This project uses 189K+ historical permits and NLP on free-text descriptions to estimate the probability of approval before submission.

XGBoost NLP / TF-IDF Classification Calgary open data 189K permits
0.93 AUC-ROC (XGBoost)

Streamlit dashboard pages

Overview

Dataset summary with approval rates, permit types, and community-level statistics

NLP features

TF-IDF analysis of permit descriptions showing which terms predict approval or refusal

03

Model performance

ROC curves, confusion matrices, and classifier comparison across four models

04

Approval predictor

Enter permit details and description text to get a real-time approval probability estimate

Key results

0.93
AUC-ROC
XGBoost classifier
87%
Accuracy
With NLP + categorical features
189K+
Development permits
With free-text descriptions

Methodology

Fetched 189,000+ development permits from Calgary Open Data via the Socrata API. Created a binary target: approved vs. not approved (75% baseline). Applied TF-IDF vectorization on cleaned permit descriptions (500 features, bigrams) and combined NLP features with categorical encodings for land-use district, community, and quadrant. Trained and compared Logistic Regression, Random Forest, Gradient Boosting, and XGBoost classifiers.

01 Fetch 189K development permits
02 Create binary approval target
03 TF-IDF on permit descriptions
04 Combine NLP + categorical features
05 Train four classifiers
06 Deploy Streamlit dashboard