utils.utils

fairdiverse.search.utils.utils.get_metrics_20(csv_file_path)[source]

Retrieves evaluation metrics from a CSV file for the top 20 documents.

Parameters:

csv_file_path – The path to the CSV file containing evaluation results.

Returns:

A tuple containing the mean values of alpha-nDCG@20, NRBP@20, ERR-IA@20, and strec@20.

fairdiverse.search.utils.utils.get_rel_feat(path)[source]

Loads and scales the relevance features from a CSV file.

Parameters:

path – Path to the CSV file containing the relevance features.

Returns:

A dictionary where the key is a tuple (query, doc) and the value is a list of features.

fairdiverse.search.utils.utils.load_embedding(filename, sep='\t')[source]

Load embedding from file :param filename: embedding file name :param sep: the char used as separation symbol :return: a dict with item name as key and embedding vector as value

fairdiverse.search.utils.utils.pkl_load(filename)[source]

Loads a pickle file and returns the data inside it.

Parameters:

filename – Path to the pickle file.

Returns:

The loaded data from the pickle file.

fairdiverse.search.utils.utils.pkl_save(data_dict, filename)[source]

Saves a dictionary to a compressed pickle file.

Parameters:
  • data_dict – The dictionary to be saved.

  • filename – The path where the pickle file should be saved.

fairdiverse.search.utils.utils.read_rel_feat(path)[source]

Reads relevance features from a CSV file and returns them in a nested dictionary format.

Parameters:

path – Path to the CSV file containing the relevance features.

Returns:

A nested dictionary where the key is a query and the value is another dictionary of documents and features.

fairdiverse.search.utils.utils.remove_duplicate(input_path, output_path)[source]

Removes duplicate documents in the ranking list.

Parameters:
  • input_path – The path to the input file containing the ranking list.

  • output_path – The path where the cleaned ranking list will be saved.

fairdiverse.search.utils.utils.restore_doc_ids(order_str, id_dict)[source]

Restores document IDs based on an ordered list of indices and a dictionary of document IDs.

Parameters:
  • order_str – A string representing the order of document indices.

  • id_dict – A dictionary mapping indices to document IDs.

Returns:

A list of document IDs in the restored order.

fairdiverse.search.utils.utils.split_list(origin_list, n)[source]

Splits the input list into smaller sublists of size n (or close to n).

Parameters:
  • origin_list – The original list to be split.

  • n – The number of sublists to split into.

Returns:

A list of sublists.