Project 07
Applying for a development permit in Calgary involves significant time, cost, and uncertainty. This project uses 189K+ historical permits and NLP on free-text descriptions to estimate the probability of approval before submission.
Interactive app
Dataset summary with approval rates, permit types, and community-level statistics
TF-IDF analysis of permit descriptions showing which terms predict approval or refusal
ROC curves, confusion matrices, and classifier comparison across four models
Enter permit details and description text to get a real-time approval probability estimate
Performance
Approach
Fetched 189,000+ development permits from Calgary Open Data via the Socrata API. Created a binary target: approved vs. not approved (75% baseline). Applied TF-IDF vectorization on cleaned permit descriptions (500 features, bigrams) and combined NLP features with categorical encodings for land-use district, community, and quadrant. Trained and compared Logistic Regression, Random Forest, Gradient Boosting, and XGBoost classifiers.