Skip to content

Recommender system project for DSCI 553 - Foundations and Applications of Data Mining at the University of Southern California (USC).

Notifications You must be signed in to change notification settings

starryjay/Yelp-Recommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Yelp-Recommender

Recommender system project for DSCI 553 - Foundations and Applications of Data Mining at the University of Southern California (USC).

  • Goal: Given a user-business pair on Yelp, accurately predict the rating given to the business by the user.
    • Training dataset of 455,854 points
    • Validation dataset of 142,044 points
  • Weighted hybrid recommender system with model-based and item-based collaborative filtering components
  • Final RMSE: 0.98477 stars
  • Execution time: 96 seconds
  • Error distribution:
Error range (stars) Number of observations Percentage of validation set
>=0 and <=1 101,530 71.48%
>1 and <=2 33,427 23.53%
>2 and <=3 6,327 4.45%
>3 and <=4 760 0.54%
>4 and <=5 0 0.00%
  • Future improvements:
    • Including even more features from dataset: number of Yelp friends a user has, compliments on Yelp profile, etc.
    • Upgrading existing packages (XGBoost, PySpark, Scikit-Learn) to take advantage of latest features
      • Fitting XGBoost model with reg:squarederror loss function, learning rate scheduler, dynamic early stopping threshold
      • Dynamically weighting hybrid recommender system or focusing on features to determine weights

About

Recommender system project for DSCI 553 - Foundations and Applications of Data Mining at the University of Southern California (USC).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages