Parameter settings data pre-processing¶

(Default values are in ~/recommendation/properties/dataset.yaml)

The benchmark provides several arguments for describing:

See below for the details:

Required parameters¶

reprocess (bool) : Should the preprocessing be redone based on the new parameters instead of using the cached files in ~/recommendation/process_dataset

item_val (int) : Retain items in the dataset if their total interactions with all users exceed item_val.
user_val (int) : Retain users in the dataset if their total interactions with all items exceed user_val.
group_val (int) : Retain item groups in the dataset if their total interactions with all users exceed group_val.
group_aggregation_threshold (int) : If the number of items owned by a group is less than this value, those groups will be merged into a single group called the ‘infrequent group.’
sample_size (float) : Sample ratio of the whole dataset to form a new subset dataset for training.

valid_ratio (float) : The ratio for validate set.
test_ratio (float) : The ratio for test set.
sample_num (int) : Negative sample numbers for ranking-based evaluation.
history_length (int) : The truncated length of a user’s interaction history with items.