This is the implementation for our paper HyperKDMA: Distilling Recommender Systems via Hypernetwork-based Teacher Assistants. In this work, we propose HyperKDMA, a distillation scheme using multiple hypernetwork-based teacher assistants to bridge the teacher-student gap in knowledge distillation for top-K recommendation. We verify the effectiveness of our method through experiments using three base models: BPR, NeuMF and LightGCN; and two public data sets: CiteULike and Foursquare.
- Clone the repo
git clone https://github.com/hieunm44/hyperkdma.git cd hyperkdma
- Generate datasets
Then dataset files will be generated in the folder
python3 gen_dataset_seed.py
datasets
. - Train a teacher model
python3 main_no_KD --model BPR --dim 200 --dataset CiteULike
- Now you have different ways to train a student model, for example:
- Train a student model without KD
python3 main_no_KD --model BPR --dim 20 --dataset CiteULike
- Train a student model with KD using DE
python3 main_DE --model BPR --teacher_dim 200 --student_dim 20 --dataset CiteULike
- Train a student model with KD using HyperKDMA-DE
python3 main_DETA --model BPR --teacher_dim 200 --student_dim 20 --num_TAs 8 --dataset CiteULike
We compare our model with the following competitors: Distillation Experts (DE), Personalized Hint Regression (PHR), Knowledge Distillation via Teacher Assistant(TAKD), and Densely Guided Knowledge Distillation (DGKD). Our model HyperKDMA significantly outperforms other KD methods thanks to the personalized learning mechanism.