Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to get the dataset interactions.csv #5

Closed
gccrpm opened this issue Mar 14, 2018 · 5 comments
Closed

how to get the dataset interactions.csv #5

gccrpm opened this issue Mar 14, 2018 · 5 comments

Comments

@gccrpm
Copy link

gccrpm commented Mar 14, 2018

hi , @mquad ,i can't get dataset from XING.com ,could you tell how to get the dataset?
thank you!

@mquad
Copy link
Owner

mquad commented Apr 13, 2018

Hi @gccrpm , sadly the dataset has been removed by the owners (see #1 ). You can try out other public datasets such as the Retailrocket one (https://www.kaggle.com/retailrocket/ecommerce-dataset)

@simushga
Copy link

Hi,

I tried to run "python build_dataset.py interactions.csv". It says:

Loading interactions.csv
Building sessions
Original data:
Num items: 2314
Num users: 1819
Num sessions: 3189
Filtering data
Filtered data:
Num items: 0
Num users: 0
Num sessions: 0
Partitioning data
Write to disk

I appreciate if you help me understand why filtered data is all 0.

Here is how my data looks like:

user_id item_id interaction_type created_at
100001 214004541 1 1515827044
100002 214006968 1 1523192543
100003 214005492 1 1515076970

Thanks
Sima

@simushga
Copy link

I found the answer to my above question. Filtering was too tight for my sample mini dataset. Namely, these two were removing everything:

keep items with >=20 interactions
let's keep only returning users (with >= 5 sessions)

Thanks

@mquad
Copy link
Owner

mquad commented Apr 26, 2018

Happy to see that you found the solution 😃

@mquad mquad closed this as completed Apr 26, 2018
@chordou
Copy link

chordou commented Dec 4, 2019

Hi,

I tried to run "python build_dataset.py interactions.csv". It says:

Loading interactions.csv
Building sessions
Original data:
Num items: 2314
Num users: 1819
Num sessions: 3189
Filtering data
Filtered data:
Num items: 0
Num users: 0
Num sessions: 0
Partitioning data
Write to disk

I appreciate if you help me understand why filtered data is all 0.

Here is how my data looks like:

user_id item_id interaction_type created_at
100001 214004541 1 1515827044
100002 214006968 1 1523192543
100003 214005492 1 1515076970

Thanks
Sima

sorry to bother you. Would you mind providing me dataset? Thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants