A ride demand forecasting pipeline using LightGBM with spatial cross-validation and cyclical temporal encoding, trained on 20K records across 30 Calgary neighborhoods.
Five-page Streamlit application for demand visualization and zone-level forecasting
LightGBM with spatial features and cyclical encoding on 20K ride demand records
Synthetic ride demand data (20K records) across 30 Calgary neighborhoods with features covering geographic coordinates, temporal patterns, weather conditions, and local infrastructure. Spatial features include haversine distance to downtown and KMeans zone clusters. Temporal features use cyclical sine/cosine encoding for hour, day, and month to preserve circular relationships. Four regressors (Ridge, Random Forest, XGBoost, LightGBM) are compared with 5-fold cross-validation. Leave-one-zone-out spatial CV validates generalization to unseen geographic areas.
Synthetic ride demand data built from realistic distributions based on Calgary geography. The 30 neighborhoods use real centroid coordinates, and demand patterns are shaped by distance to downtown, population density, restaurant and transit infrastructure, hourly rush cycles, weather conditions, and event proximity. Calgary's seasonal climate (cold winters, warm summers) influences both temperature distributions and seasonal demand multipliers.