Parameter settings data pre-processing¶
(Default values are in ~/recommendation/properties/dataset.yaml)
The benchmark provides several arguments for describing:
Basic setting of the parameters
See below for the details:
Required parameters¶
Cache set ups¶
reprocess (bool)
: Should the preprocessing be redone based on the new parameters instead of using the cached files in ~/recommendation/process_dataset
Filtering set ups¶
item_val (int)
: Retain items in the dataset if their total interactions with all users exceed item_val.user_val (int)
: Retain users in the dataset if their total interactions with all items exceed user_val.group_val (int)
: Retain item groups in the dataset if their total interactions with all users exceed group_val.group_aggregation_threshold (int)
: If the number of items owned by a group is less than this value, those groups will be merged into a single group called the ‘infrequent group.’sample_size (float)
: Sample ratio of the whole dataset to form a new subset dataset for training.
Connect set ups¶
valid_ratio (float)
: The ratio for validate set.test_ratio (float)
: The ratio for test set.sample_num (int)
: Negative sample numbers for ranking-based evaluation.history_length (int)
: The truncated length of a user’s interaction history with items.