23

Geospatial demand forecast

A ride demand forecasting pipeline using LightGBM with spatial cross-validation and cyclical temporal encoding, trained on 20K records across 30 Calgary neighborhoods.

Demand forecasting LightGBM Spatial CV Cyclical encoding Geospatial Regression
0.85
R-squared, best model (LightGBM)

Interactive dashboard

Five-page Streamlit application for demand visualization and zone-level forecasting

Demand heatmap
  • Mapbox scatter of predicted demand by zone
  • Hour-of-day slider for temporal exploration
  • Demand distribution across all hours
Zone forecast timeline
  • 24-hour demand profile per zone
  • Confidence band (+/- 1 std)
  • Day-of-week demand patterns
Feature importance
  • LightGBM feature importance rankings
  • Pearson correlation with demand target
  • Spatial vs temporal feature analysis
Peak demand alerts
  • Configurable demand threshold alerts
  • Zone-hour heatmap for peak identification
  • Top alert table with affected zones
$ pip install -r requirements.txt && streamlit run app.py

Key results

LightGBM with spatial features and cyclical encoding on 20K ride demand records

0.85
R-squared
4.2
MAE (rides/zone/hour)
5.8
RMSE
30
Calgary zones

Methodology

Synthetic ride demand data (20K records) across 30 Calgary neighborhoods with features covering geographic coordinates, temporal patterns, weather conditions, and local infrastructure. Spatial features include haversine distance to downtown and KMeans zone clusters. Temporal features use cyclical sine/cosine encoding for hour, day, and month to preserve circular relationships. Four regressors (Ridge, Random Forest, XGBoost, LightGBM) are compared with 5-fold cross-validation. Leave-one-zone-out spatial CV validates generalization to unseen geographic areas.

Data + features
20K records, 30 zones
Spatial encoding
Distance, KMeans clusters
Model training
Ridge, RF, XGB, LGBM
Spatial CV
Leave-one-zone-out

How to run

$ git clone https://github.com/guydev42/geospatial-demand-forecast.git $ cd calgary-data-portfolio/project_23_geospatial_demand_forecast $ pip install -r requirements.txt $ python data/generate_data.py $ streamlit run app.py

Data source

Synthetic ride demand data built from realistic distributions based on Calgary geography. The 30 neighborhoods use real centroid coordinates, and demand patterns are shaped by distance to downtown, population density, restaurant and transit infrastructure, hourly rush cycles, weather conditions, and event proximity. Calgary's seasonal climate (cold winters, warm summers) influences both temperature distributions and seasonal demand multipliers.

Links