Parameter settings data pre-processing

(Default values are in ~/recommendation/properties/dataset.yaml)

The benchmark provides several arguments for describing:

  • Basic setting of the parameters

See below for the details:

Required parameters

Cache set ups

  • reprocess (bool) : Should the preprocessing be redone based on the new parameters instead of using the cached files in ~/recommendation/process_dataset

Filtering set ups

  • item_val (int) : Retain items in the dataset if their total interactions with all users exceed item_val.

  • user_val (int) : Retain users in the dataset if their total interactions with all items exceed user_val.

  • group_val (int) : Retain item groups in the dataset if their total interactions with all users exceed group_val.

  • group_aggregation_threshold (int) : If the number of items owned by a group is less than this value, those groups will be merged into a single group called the ‘infrequent group.’

  • sample_size (float) : Sample ratio of the whole dataset to form a new subset dataset for training.

Connect set ups

  • valid_ratio (float) : The ratio for validate set.

  • test_ratio (float) : The ratio for test set.

  • sample_num (int) : Negative sample numbers for ranking-based evaluation.

  • history_length (int) : The truncated length of a user’s interaction history with items.