Building permit cost predictor
Predict construction costs from permit features using 484K+ historical Calgary records with XGBoost regression.
A machine learning portfolio built on Calgary Open Data, customer analytics, and statistical experimentation.
Customer analytics, propensity modeling, and experimentation frameworks designed for telecom and subscription businesses.
85% of churners caught, $141K/year projected savings from targeted retention interventions.
Top 3 deciles capture 63% of responders, 2,155% campaign ROI with targeted outreach.
End-to-end machine learning projects spanning classification, regression, time-series forecasting, NLP, and more.
Predict construction costs from permit features using 484K+ historical Calgary records with XGBoost regression.
Classify and analyze crime patterns across Calgary communities with demographic integration and risk scoring.
Spatial clustering and temporal analysis of 60K+ traffic incidents with animated heatmaps and risk scoring.
Forecast Bow River levels with LSTM, Prophet, and ARIMA using 9.5M+ five-minute interval measurements.
Forecast emergency shelter demand using 83K+ daily occupancy records for proactive resource allocation.
Cluster 200+ Calgary communities into livability profiles using census, crime, business, and housing data.
Predict development permit approval likelihood with NLP on 189K+ project descriptions.
Forecast solar PV production across city facilities with seasonal decomposition and ROI analysis.
Survival analysis on 22K+ business licences to identify longevity factors and recommend optimal locations.
Monitor multi-parameter water quality across Calgary's watershed with Isolation Forest anomaly detection.
NLP text classification on 500K+ citizen service requests for automatic department routing.
Estimate 500K+ property values with XGBoost and SHAP-based explainability for each valuation.
Graph network analysis and demand forecasting for Calgary Transit route optimization.
Identify high-risk telecom customers before they cancel, with SHAP explainability and business impact analysis.
Score customer likelihood to respond to upsell campaigns with calibrated probabilities and decile analysis.
Unified experimentation framework with frequentist, Bayesian, and sequential testing methods.
Detect fraudulent transactions with SMOTE-balanced models and SHAP explainability on 10K synthetic transactions.
Retrieval-augmented generation system using TF-IDF and BM25 over municipal policy documents. No external API dependencies.
End-to-end ML pipeline with PSI-based drift detection, model versioning, champion/challenger deployment, and Docker containerization.
Multi-class sentiment classification on 5K product reviews with TF-IDF vectorization and SVM, achieving 89% accuracy.
Predict equipment failure from 15K sensor readings across 50 machines with survival analysis and cost-optimized thresholds.
Hybrid collaborative filtering + content-based + SVD recommendation system on 50K ratings from 2K users and 500 items.
Predict ride demand across 30 Calgary zones using spatial cross-validation, cyclical encoding, and LightGBM.
Predict employee disengagement from HR recognition data. Behavioral trend detection, department benchmarks, and retention recommendations.
Product analytics for SaaS platforms: cohort retention, feature adoption tracking, at-risk account identification, and usage dashboards.
Compute industry-standard KPI benchmarks across 8 sectors. Percentile rankings, peer comparison, gap analysis, and custom report generation.
Every project follows a disciplined end-to-end pipeline, from raw data through production-ready models and business insight.
Technical depth across the full machine learning lifecycle, from data engineering to model deployment.
Ola K. is a data scientist based in Canada with a background in mathematics, data science, and applied machine learning.
This portfolio was built to demonstrate end-to-end ML capabilities on real-world data. The first 13 projects use Calgary Open Data covering urban transportation, public safety, environment, and real estate. Projects 14 through 16 focus on customer analytics and experimentation for telecom and subscription businesses.
Core interests include customer analytics, experimentation design, production ML systems, and using statistical rigor to drive business decisions.
Currently seeking data science roles focused on customer analytics, experimentation, and production ML systems. Open to remote positions across Canada.