process_dataset

fairdiverse.search.utils.process_dataset.calculate_best_rank(qd, config)[source]

Calculates the best ranking of documents for each query.

Parameters:
  • qd – A dictionary of div_query objects.

  • config – A dictionary containing configuration settings.

fairdiverse.search.utils.process_dataset.data_process(config)[source]

Main function for processing query, document, and relevance data. :param config: A dictionary containing configuration settings.

fairdiverse.search.utils.process_dataset.data_process_worker(task, data_dir)[source]

Processes a list of queries and saves them as data files.

Parameters:
  • task – A list of query ID and div_query objects to process.

  • data_dir – The directory where processed query data will be saved.

fairdiverse.search.utils.process_dataset.generate_qd(config)[source]

Generates a div_query file from the data directory.

Parameters:

config – A dictionary containing configuration settings.

Returns:

A dictionary of div_query objects.

fairdiverse.search.utils.process_dataset.get_doc_judge(qd, dd, ds, config)[source]

Loads the document lists and relevance score lists for the corresponding queries.

Parameters:
  • qd – A dictionary of div_query objects.

  • dd – A dictionary of document IDs for each query.

  • ds – A dictionary of relevance scores for documents for each query.

  • config – A dictionary containing configuration settings.

Returns:

The updated qd dictionary with documents and relevance scores added, and judged for relevance.

fairdiverse.search.utils.process_dataset.get_docs_dict(config)[source]

Loads the document IDs and their relevance scores for each query. docs_dict[qid] = [doc_id, …] docs_rel_score_dict[qid] = [score, …]

Parameters:

config – A dictionary containing configuration settings.

Returns:

Two dictionaries: docs_dict (query ID to document IDs) and docs_rel_score_dict (query ID to relevance scores).

fairdiverse.search.utils.process_dataset.get_query_dict(config)[source]

Generates a dictionary of queries and their subtopics. :param config: A dictionary containing configuration settings. :return: A dictionary mapping query IDs (qid) to div_query objects containing the query and subtopics.

fairdiverse.search.utils.process_dataset.get_query_suggestion(dq, config)[source]

Adds query suggestions to the query dictionary (dq) for each query.

Parameters:
  • dq – A dictionary of div_query objects.

  • config – A dictionary containing configuration settings.

Returns:

A dictionary of div_query objects with added query suggestions.

fairdiverse.search.utils.process_dataset.get_stand_best_metric(qd, config)[source]

Loads the best alpha-nDCG metric from the DSSA.

Parameters:
  • qd – A dictionary of div_query objects.

  • config – A dictionary containing configuration settings.