JDRec

This code was used for offline experiments of JDRec (http://arxiv.org/abs/2102.xxxxx). You can reproduce the offline experiments in the paper, or just obtain the JDRec dataset for further research.

offline experiments

3 quick steps to reproduce the offline experiments in the paper:

download the data: sh get_data.sh
train the models: python main.py train
evaluate the models: python main.py eval

Based on python 3.8 and tensorflow 2.2.0, we got the following results, which can be a baseline for further research on reinforcement learning for Recommender System:

Evaluator	click AUC
Pointwise Evaluator	0.7201
Listwise Evaluator	0.7248

Generator	Average CTR
Naive RL Generator	0.0390
CTR RL Generator	0.0650

JDRec dataset

If you only need the JDRec dataset for some further research, you only need to run the script get_data.sh, or download the data directly through the links in the script. Please cite the paper if you use the data in any way.

In addition to the information provided in the paper, more details about the JDRec dataset need to be introduced:

In our offline experiments, the data format is one-item-per-line csv. Each line consist of 53 columns, following the following order:

Click, RerankIndex, Improv, RequestTime, SkuCategory1, SkuCategory2, SkuCategory3, SkuShopId, SkuVendorId, SkuBestProduct, SkuBrandId, PCtr, PCvr, PGmv, TotalValue, PCtrCtrInt, PCtrCvrInt, PCtrGmvInt, PageNum, PCtrInt, PCvrInt, PGmvInt, ValueInt, CidOneExpNum, CidOneClkNum, CidOneNoClkNum, CidOneExpGap, CidOneClkGap, CidOneClkTimestamp, CidTwoExpNum, CidTwoClkNum, CidTwoNoClkNum, CidTwoExpGap, CidTwoClkGap, CidTwoClkTimestamp, CidThreeExpNum, CidThreeClkNum, CidThreeNoClkNum, CidThreeExpGap, CidThreeClkGap, CidThreeClkTimestamp, BrandExpNum, BrandClkNum, BrandNoClkNum, BrandExpGap, BrandClkGap, BrandClkTimestamp, ProductExpNum, ProductClkNum, ProductNoClkNum, ProductExpGap, ProductClkGap, ProductClkTimestamp.

Each sample includes 44 items, so lines 1 through 44 belong to the first sample, lines 45 through 88 belong to the second one, etc. In each sample, the first 4 lines are finally selected by the online rerank module, while the following 40 lines are all candidate items(include the 4 selected items). All samples are orderd by RequestTime.

If you are still confused about our csv data format, you can also choose the one-sample-per-line json format. JDRec dataset in the two format include exactly the same data, except that the json format data include column infomation and sample gramularity structure. To download json format dataset, you only need to replace all ‘csv’s in the links with 'json's. For example: http://storage.jd.com/jdrec-json/train_0.json,etc.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
get_data.sh		get_data.sh
main.py		main.py
metrics.py		metrics.py
sample_tools.py		sample_tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JDRec

offline experiments

JDRec dataset

About

Releases

Packages

Languages

SeekerYb/JDRec

Folders and files

Latest commit

History

Repository files navigation

JDRec

offline experiments

JDRec dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages