Parameter settings data pre-processing
========================================

(Default values are in ~/recommendation/properties/dataset.yaml)

The benchmark provides several arguments for describing:

- Basic setting of the parameters

See below for the details:

Required parameters
----------------------

Cache set ups
''''''''''''''''''
- ``reprocess (bool)`` : Should the preprocessing be redone based on the new parameters instead of using the cached files in ~/recommendation/process_dataset


Filtering set ups
''''''''''''''''''
- ``item_val (int)`` : Retain items in the dataset if their total interactions with all users exceed item_val.
- ``user_val (int)`` : Retain users in the dataset if their total interactions with all items exceed user_val.
- ``group_val (int)`` : Retain item groups in the dataset if their total interactions with all users exceed group_val.
- ``group_aggregation_threshold (int)`` : If the number of items owned by a group is less than this value, those groups will be merged into a single group called the 'infrequent group.'
- ``sample_size (float)`` : Sample ratio of the whole dataset to form a new subset dataset for training.


Connect set ups
''''''''''''''''''
- ``valid_ratio (float)`` : The ratio for validate set.
- ``test_ratio (float)`` : The ratio for test set.
- ``sample_num (int)`` : Negative sample numbers for ranking-based evaluation.
- ``history_length (int)`` : The truncated length of a user's interaction history with items.