Data science and machine learning portfolio

R² 0.89

Building permit cost predictor

Predict construction costs from permit features using 484K+ historical Calgary records with XGBoost regression.

Urban planning Regression XGBoost Random Forest 484K rows

Code Details

85% accuracy

Community crime classifier

Classify and analyze crime patterns across Calgary communities with demographic integration and risk scoring.

Public safety Classification Gradient Boosting 77K rows

Code Details

80% accuracy

Traffic incident hotspot analyzer

Spatial clustering and temporal analysis of 60K+ traffic incidents with animated heatmaps and risk scoring.

Urban planning Clustering DBSCAN Classification 60K rows

Code Details

R² 0.89

River flow forecasting

Forecast Bow River levels with LSTM, Prophet, and ARIMA using 9.5M+ five-minute interval measurements.

Environment Time series LSTM Prophet 9.5M rows

Code Details

R² 0.88

Shelter occupancy predictor

Forecast emergency shelter demand using 83K+ daily occupancy records for proactive resource allocation.

Social services Time series Prophet XGBoost 83K rows

Code Details

Silhouette 0.42

Neighborhood segmentation

Cluster 200+ Calgary communities into livability profiles using census, crime, business, and housing data.

Urban planning Clustering PCA K-Means 200+ communities

Code Details

AUC 0.93

Dev permit approval predictor

Predict development permit approval likelihood with NLP on 189K+ project descriptions.

Urban planning NLP TF-IDF Classification 189K rows

Code Details

R² 0.86

Solar energy forecaster

Forecast solar PV production across city facilities with seasonal decomposition and ROI analysis.

Energy Time series Regression Seasonal 2.3K records

Code Details

C-index 0.68

Business survival recommender

Survival analysis on 22K+ business licences to identify longevity factors and recommend optimal locations.

Business Survival analysis Kaplan-Meier Cox PH 22K rows

Code Details

75% precision

Water quality anomaly detection

Monitor multi-parameter water quality across Calgary's watershed with Isolation Forest anomaly detection.

Environment Anomaly detection Isolation Forest LOF 82 params / 7 sites

Code Details

F1 0.79

311 service request router

NLP text classification on 500K+ citizen service requests for automatic department routing.

Public safety NLP TF-IDF Multi-label 500K rows

Code Details

R² 0.77

Property assessment valuator

Estimate 500K+ property values with XGBoost and SHAP-based explainability for each valuation.

Real estate Regression SHAP XGBoost 617K properties

Code Details

R² 0.80

Transit ridership optimizer

Graph network analysis and demand forecasting for Calgary Transit route optimization.

Urban planning Time series NetworkX Graph analysis 7.7K stops

Code Details

AUC 0.713

Customer churn prediction

Identify high-risk telecom customers before they cancel, with SHAP explainability and business impact analysis.

Telecom Classification SHAP XGBoost 5K customers

Code Details

AUC 0.775

Propensity and upsell scoring

Score customer likelihood to respond to upsell campaigns with calibrated probabilities and decile analysis.

Customer analytics Propensity Calibration XGBoost 8K customers

Code Details

p < 0.001

A/B test framework

Unified experimentation framework with frequentist, Bayesian, and sequential testing methods.

Experimentation Experimentation Bayesian Sequential 30K rows

Code Details

AUC 0.97

Credit card fraud detection

Detect fraudulent transactions with SMOTE-balanced models and SHAP explainability on 10K synthetic transactions.

Fintech Classification SMOTE SHAP 10K transactions

Code Details

MRR 0.82

RAG document question answering

Retrieval-augmented generation system using TF-IDF and BM25 over municipal policy documents. No external API dependencies.

AI engineering NLP BM25 TF-IDF 15 docs / 30 queries

Code Details

Automated pipeline

MLOps deployment pipeline

End-to-end ML pipeline with PSI-based drift detection, model versioning, champion/challenger deployment, and Docker containerization.

MLOps Docker CI/CD Drift detection Pipeline

Code Details

F1 0.87

NLP sentiment analysis

Multi-class sentiment classification on 5K product reviews with TF-IDF vectorization and SVM, achieving 89% accuracy.

NLP NLP TF-IDF SVM 5K reviews

Code Details

AUC 0.94

Predictive maintenance

Predict equipment failure from 15K sensor readings across 50 machines with survival analysis and cost-optimized thresholds.

Industrial Classification Survival analysis SHAP 15K readings

Code Details

NDCG@10 0.78

Recommendation engine

Hybrid collaborative filtering + content-based + SVD recommendation system on 50K ratings from 2K users and 500 items.

E-commerce Collaborative filtering SVD Hybrid 50K ratings

Code Details

R² 0.85

Geospatial demand forecast

Predict ride demand across 30 Calgary zones using spatial cross-validation, cyclical encoding, and LightGBM.

Urban planning Regression Spatial CV LightGBM 20K records

Code Details

Employee engagement analyzer

Predict employee disengagement from HR recognition data. Behavioral trend detection, department benchmarks, and retention recommendations.

HR analytics Classification Segmentation LightGBM 8K employees

Code Details

SaaS usage analytics

Product analytics for SaaS platforms: cohort retention, feature adoption tracking, at-risk account identification, and usage dashboards.

SaaS / Product Cohort analysis Retention Churn 3K users

Code Details

Industry benchmark engine

Compute industry-standard KPI benchmarks across 8 sectors. Percentile rankings, peer comparison, gap analysis, and custom report generation.

HR analytics Benchmarking Percentiles Segmentation 500 companies

Code Details

23 projects.
Real data.
Real models.

Flagship projects

Customer churn prediction

Propensity and upsell scoring

A/B test framework

Project catalog

Building permit cost predictor

Community crime classifier

Traffic incident hotspot analyzer

River flow forecasting

Shelter occupancy predictor

Neighborhood segmentation

Dev permit approval predictor

Solar energy forecaster

Business survival recommender

Water quality anomaly detection

311 service request router

Property assessment valuator

Transit ridership optimizer

Customer churn prediction

Propensity and upsell scoring

A/B test framework

Credit card fraud detection

RAG document question answering

MLOps deployment pipeline

NLP sentiment analysis

Predictive maintenance

Recommendation engine

Geospatial demand forecast

Employee engagement analyzer

SaaS usage analytics

Industry benchmark engine

How I work

Skills and tools

Technical

Domain expertise

About

At a glance

23 projects.Real data.Real models.

Flagship projects

Customer churn prediction

Propensity and upsell scoring

A/B test framework

Project catalog

Building permit cost predictor

Community crime classifier

Traffic incident hotspot analyzer

River flow forecasting

Shelter occupancy predictor

Neighborhood segmentation

Dev permit approval predictor

Solar energy forecaster

Business survival recommender

Water quality anomaly detection

311 service request router

Property assessment valuator

Transit ridership optimizer

Customer churn prediction

Propensity and upsell scoring

A/B test framework

Credit card fraud detection

RAG document question answering

MLOps deployment pipeline

NLP sentiment analysis

Predictive maintenance

Recommendation engine

Geospatial demand forecast

Employee engagement analyzer

SaaS usage analytics

Industry benchmark engine

How I work

Skills and tools

Technical

Domain expertise

About

At a glance

23 projects.
Real data.
Real models.