process_dataset¶
- fairdiverse.search.utils.process_dataset.calculate_best_rank(qd, config)[source]¶
Calculates the best ranking of documents for each query.
- Parameters:
qd – A dictionary of div_query objects.
config – A dictionary containing configuration settings.
- fairdiverse.search.utils.process_dataset.data_process(config)[source]¶
Main function for processing query, document, and relevance data. :param config: A dictionary containing configuration settings.
- fairdiverse.search.utils.process_dataset.data_process_worker(task, data_dir)[source]¶
Processes a list of queries and saves them as data files.
- Parameters:
task – A list of query ID and div_query objects to process.
data_dir – The directory where processed query data will be saved.
- fairdiverse.search.utils.process_dataset.generate_qd(config)[source]¶
Generates a div_query file from the data directory.
- Parameters:
config – A dictionary containing configuration settings.
- Returns:
A dictionary of div_query objects.
- fairdiverse.search.utils.process_dataset.get_doc_judge(qd, dd, ds, config)[source]¶
Loads the document lists and relevance score lists for the corresponding queries.
- Parameters:
qd – A dictionary of div_query objects.
dd – A dictionary of document IDs for each query.
ds – A dictionary of relevance scores for documents for each query.
config – A dictionary containing configuration settings.
- Returns:
The updated qd dictionary with documents and relevance scores added, and judged for relevance.
- fairdiverse.search.utils.process_dataset.get_docs_dict(config)[source]¶
Loads the document IDs and their relevance scores for each query. docs_dict[qid] = [doc_id, …] docs_rel_score_dict[qid] = [score, …]
- Parameters:
config – A dictionary containing configuration settings.
- Returns:
Two dictionaries: docs_dict (query ID to document IDs) and docs_rel_score_dict (query ID to relevance scores).
- fairdiverse.search.utils.process_dataset.get_query_dict(config)[source]¶
Generates a dictionary of queries and their subtopics. :param config: A dictionary containing configuration settings. :return: A dictionary mapping query IDs (qid) to div_query objects containing the query and subtopics.
- fairdiverse.search.utils.process_dataset.get_query_suggestion(dq, config)[source]¶
Adds query suggestions to the query dictionary (dq) for each query.
- Parameters:
dq – A dictionary of div_query objects.
config – A dictionary containing configuration settings.
- Returns:
A dictionary of div_query objects with added query suggestions.