22

Hybrid recommendation engine

A recommendation system combining collaborative filtering, content-based filtering, and SVD matrix factorization, trained on 50K ratings across 2K users and 500 items.

Collaborative filtering Content-based SVD Hybrid scoring Cold start TF-IDF
0.78
NDCG@10, hybrid model

Interactive dashboard

Five-page Streamlit application for personalized recommendations and model analysis

User recommendations
  • Select user and adjust hybrid weights
  • Top-N personalized item recommendations
  • Category breakdown of recommended items
Because you liked
  • Explainable "because you liked X" reasoning
  • Content similarity scores for each recommendation
  • Visual similarity bar chart
Item similarity
  • Browse items by category and find similar ones
  • Pairwise similarity heatmap
  • TF-IDF content similarity scores
Metrics comparison
  • Precision@K, Recall@K, NDCG@K per method
  • Radar chart comparing all five methods
  • SVD RMSE and MAE statistics
Coverage and diversity
  • Catalog coverage and recommendation diversity statistics
  • Rating distributions, ratings per user, category breakdowns
  • User demographics and item price tier analysis
$ pip install -r requirements.txt && streamlit run app.py

Key results

Hybrid model on 50K ratings across 2K users and 500 items

0.78
NDCG@10
0.91
RMSE (SVD)
95%
Matrix sparsity
50K
Ratings

Methodology

Synthetic user-item interaction data (2K users, 500 items, 50K ratings) with latent category preferences driving realistic rating patterns. Three recommendation approaches are combined: collaborative filtering (user-based and item-based cosine similarity on the sparse rating matrix), content-based filtering (TF-IDF on item descriptions with cosine similarity), and matrix factorization (truncated SVD with 50 latent factors). The hybrid model weights these signals (40% CF + 20% content + 40% SVD) and includes a popularity-based cold-start fallback for new users.

Data + sparse matrix
50K ratings, 95% sparse
CF + content + SVD
Three independent models
Hybrid combiner
Weighted score fusion
Evaluation
NDCG, RMSE, coverage

How to run

$ git clone https://github.com/guydev42/recommendation-engine.git $ cd calgary-data-portfolio/project_22_recommendation_engine $ pip install -r requirements.txt $ python data/generate_data.py $ streamlit run app.py

Data source

Synthetic recommendation data built from realistic distributions. User activity follows a power-law pattern where a minority of users contribute most ratings. Item popularity exhibits a long-tail distribution. Ratings are influenced by latent user-category preferences with Gaussian noise, producing the correlation structure that collaborative filtering and SVD are designed to exploit. The 95% sparsity level matches typical e-commerce recommendation datasets.

Links