Portfolio

23 projects.
Real data.
Real models.

A machine learning portfolio built on Calgary Open Data, customer analytics, and statistical experimentation.

0
Projects
0
Data points
0
Domains
View projects

Project catalog

End-to-end machine learning projects spanning classification, regression, time-series forecasting, NLP, and more.

Urban planning (6) Telecom & customer analytics (3) Environment & energy (3) Public safety (2) NLP (2) Fintech (1) AI engineering (1) MLOps (1) Industrial (1) E-commerce (1) Real estate (1) Social services (1) Business (1)
R² 0.89

Building permit cost predictor

Predict construction costs from permit features using 484K+ historical Calgary records with XGBoost regression.

Urban planning Regression XGBoost Random Forest 484K rows
Code Details
85% accuracy

Community crime classifier

Classify and analyze crime patterns across Calgary communities with demographic integration and risk scoring.

Public safety Classification Gradient Boosting 77K rows
Code Details
80% accuracy

Traffic incident hotspot analyzer

Spatial clustering and temporal analysis of 60K+ traffic incidents with animated heatmaps and risk scoring.

Urban planning Clustering DBSCAN Classification 60K rows
Code Details
R² 0.89

River flow forecasting

Forecast Bow River levels with LSTM, Prophet, and ARIMA using 9.5M+ five-minute interval measurements.

Environment Time series LSTM Prophet 9.5M rows
Code Details
R² 0.88

Shelter occupancy predictor

Forecast emergency shelter demand using 83K+ daily occupancy records for proactive resource allocation.

Social services Time series Prophet XGBoost 83K rows
Code Details
Silhouette 0.42

Neighborhood segmentation

Cluster 200+ Calgary communities into livability profiles using census, crime, business, and housing data.

Urban planning Clustering PCA K-Means 200+ communities
Code Details
AUC 0.93

Dev permit approval predictor

Predict development permit approval likelihood with NLP on 189K+ project descriptions.

Urban planning NLP TF-IDF Classification 189K rows
Code Details
R² 0.86

Solar energy forecaster

Forecast solar PV production across city facilities with seasonal decomposition and ROI analysis.

Energy Time series Regression Seasonal 2.3K records
Code Details
C-index 0.68

Business survival recommender

Survival analysis on 22K+ business licences to identify longevity factors and recommend optimal locations.

Business Survival analysis Kaplan-Meier Cox PH 22K rows
Code Details
75% precision

Water quality anomaly detection

Monitor multi-parameter water quality across Calgary's watershed with Isolation Forest anomaly detection.

Environment Anomaly detection Isolation Forest LOF 82 params / 7 sites
Code Details
F1 0.79

311 service request router

NLP text classification on 500K+ citizen service requests for automatic department routing.

Public safety NLP TF-IDF Multi-label 500K rows
Code Details
R² 0.77

Property assessment valuator

Estimate 500K+ property values with XGBoost and SHAP-based explainability for each valuation.

Real estate Regression SHAP XGBoost 617K properties
Code Details
R² 0.80

Transit ridership optimizer

Graph network analysis and demand forecasting for Calgary Transit route optimization.

Urban planning Time series NetworkX Graph analysis 7.7K stops
Code Details
AUC 0.713

Customer churn prediction

Identify high-risk telecom customers before they cancel, with SHAP explainability and business impact analysis.

Telecom Classification SHAP XGBoost 5K customers
Code Details
AUC 0.775

Propensity and upsell scoring

Score customer likelihood to respond to upsell campaigns with calibrated probabilities and decile analysis.

Customer analytics Propensity Calibration XGBoost 8K customers
Code Details
p < 0.001

A/B test framework

Unified experimentation framework with frequentist, Bayesian, and sequential testing methods.

Experimentation Experimentation Bayesian Sequential 30K rows
Code Details
AUC 0.97

Credit card fraud detection

Detect fraudulent transactions with SMOTE-balanced models and SHAP explainability on 10K synthetic transactions.

Fintech Classification SMOTE SHAP 10K transactions
Code Details
MRR 0.82

RAG document question answering

Retrieval-augmented generation system using TF-IDF and BM25 over municipal policy documents. No external API dependencies.

AI engineering NLP BM25 TF-IDF 15 docs / 30 queries
Code Details
Automated pipeline

MLOps deployment pipeline

End-to-end ML pipeline with PSI-based drift detection, model versioning, champion/challenger deployment, and Docker containerization.

MLOps Docker CI/CD Drift detection Pipeline
Code Details
F1 0.87

NLP sentiment analysis

Multi-class sentiment classification on 5K product reviews with TF-IDF vectorization and SVM, achieving 89% accuracy.

NLP NLP TF-IDF SVM 5K reviews
Code Details
AUC 0.94

Predictive maintenance

Predict equipment failure from 15K sensor readings across 50 machines with survival analysis and cost-optimized thresholds.

Industrial Classification Survival analysis SHAP 15K readings
Code Details
NDCG@10 0.78

Recommendation engine

Hybrid collaborative filtering + content-based + SVD recommendation system on 50K ratings from 2K users and 500 items.

E-commerce Collaborative filtering SVD Hybrid 50K ratings
Code Details
R² 0.85

Geospatial demand forecast

Predict ride demand across 30 Calgary zones using spatial cross-validation, cyclical encoding, and LightGBM.

Urban planning Regression Spatial CV LightGBM 20K records
Code Details

Employee engagement analyzer

Predict employee disengagement from HR recognition data. Behavioral trend detection, department benchmarks, and retention recommendations.

HR analytics Classification Segmentation LightGBM 8K employees
Code Details

SaaS usage analytics

Product analytics for SaaS platforms: cohort retention, feature adoption tracking, at-risk account identification, and usage dashboards.

SaaS / Product Cohort analysis Retention Churn 3K users
Code Details

Industry benchmark engine

Compute industry-standard KPI benchmarks across 8 sectors. Percentile rankings, peer comparison, gap analysis, and custom report generation.

HR analytics Benchmarking Percentiles Segmentation 500 companies
Code Details

How I work

Every project follows a disciplined end-to-end pipeline, from raw data through production-ready models and business insight.

1 Ingest 2 Explore 3 Engineer 4 Model 5 Evaluate 6 Deploy 7 Monitor
01
Data ingestion
Pull raw data from Calgary Open Data (Socrata API), public datasets, or synthetic generators. Validate schemas, handle encoding, and store clean parquet snapshots.
Socrata API pandas requests
02
Exploratory analysis
Profile distributions, detect outliers, test correlations, and surface domain-specific patterns. Every notebook starts with a statistical summary and visual audit.
matplotlib seaborn plotly
03
Feature engineering
Transform raw columns into predictive signals — lag features, rolling statistics, target encoding, interaction terms, and domain-specific ratios.
scikit-learn NumPy category_encoders
04
Model training
Train multiple candidates with cross-validation. Hyperparameter search via Optuna or grid search. Compare baselines against gradient boosting, neural nets, and ensembles.
XGBoost LightGBM TensorFlow
05
Evaluation and explainability
Score on held-out data with business-relevant metrics. Generate SHAP explanations, calibration curves, and cost-benefit matrices to justify model decisions.
SHAP SciPy calibration
06
Deployment and monitoring
Ship interactive Streamlit dashboards with live predictions. Package models as reproducible pipelines. Track data drift and retrain triggers for production stability.
Streamlit Docker GitHub Actions

Skills and tools

Technical depth across the full machine learning lifecycle, from data engineering to model deployment.

Technical

Languages
Python SQL Julia
Machine learning
scikit-learn XGBoost LightGBM SHAP Prophet ARIMA LSTM
Data
pandas NumPy SciPy
Visualization
Matplotlib Seaborn Plotly Streamlit
Cloud
AWS GCP
Tools
Git Docker Jupyter

Domain expertise

Customer analytics Churn, propensity, CLV
Experimentation A/B testing, Bayesian inference
Time-series forecasting ARIMA, Prophet, LSTM
NLP TF-IDF, text classification
Geospatial analysis Clustering, heatmaps
MLOps fundamentals Pipelines, versioning

About

Ola K. is a data scientist based in Canada with a background in mathematics, data science, and applied machine learning.

This portfolio was built to demonstrate end-to-end ML capabilities on real-world data. The first 13 projects use Calgary Open Data covering urban transportation, public safety, environment, and real estate. Projects 14 through 16 focus on customer analytics and experimentation for telecom and subscription businesses.

Core interests include customer analytics, experimentation design, production ML systems, and using statistical rigor to drive business decisions.

Currently seeking data science roles focused on customer analytics, experimentation, and production ML systems. Open to remote positions across Canada.

At a glance

Location Canada (open to remote)
Focus Applied ML, customer analytics
Background Mathematics, data science
Projects 23 end-to-end
Availability Open to opportunities