A recommendation system combining collaborative filtering, content-based filtering, and SVD matrix factorization, trained on 50K ratings across 2K users and 500 items.
Five-page Streamlit application for personalized recommendations and model analysis
User recommendations
Select user and adjust hybrid weights
Top-N personalized item recommendations
Category breakdown of recommended items
Because you liked
Explainable "because you liked X" reasoning
Content similarity scores for each recommendation
Visual similarity bar chart
Item similarity
Browse items by category and find similar ones
Pairwise similarity heatmap
TF-IDF content similarity scores
Metrics comparison
Precision@K, Recall@K, NDCG@K per method
Radar chart comparing all five methods
SVD RMSE and MAE statistics
Coverage and diversity
Catalog coverage and recommendation diversity statistics
Rating distributions, ratings per user, category breakdowns
User demographics and item price tier analysis
$ pip install -r requirements.txt && streamlit run app.py
Key results
Hybrid model on 50K ratings across 2K users and 500 items
0.78
NDCG@10
0.91
RMSE (SVD)
95%
Matrix sparsity
50K
Ratings
Methodology
Synthetic user-item interaction data (2K users, 500 items, 50K ratings) with latent category preferences driving realistic rating patterns. Three recommendation approaches are combined: collaborative filtering (user-based and item-based cosine similarity on the sparse rating matrix), content-based filtering (TF-IDF on item descriptions with cosine similarity), and matrix factorization (truncated SVD with 50 latent factors). The hybrid model weights these signals (40% CF + 20% content + 40% SVD) and includes a popularity-based cold-start fallback for new users.
Data + sparse matrix
50K ratings, 95% sparse
CF + content + SVD
Three independent models
Hybrid combiner
Weighted score fusion
Evaluation
NDCG, RMSE, coverage
How to run
$ git clone https://github.com/guydev42/recommendation-engine.git
$ cd calgary-data-portfolio/project_22_recommendation_engine
$ pip install -r requirements.txt
$ python data/generate_data.py
$ streamlit run app.py
Data source
Synthetic recommendation data built from realistic distributions. User activity follows a power-law pattern where a minority of users contribute most ratings. Item popularity exhibits a long-tail distribution. Ratings are influenced by latent user-category preferences with Gaussian noise, producing the correlation structure that collaborative filtering and SVD are designed to exploit. The 95% sparsity level matches typical e-commerce recommendation datasets.