div_type¶
- class fairdiverse.search.utils.div_type.div_dataset(config)[source]¶
Bases:
object
- get_listpair_train_data(top_n=50)[source]¶
Generates list-pair training samples using the top N relevant documents. This function processes the best document ranks for each query, generates list-pair samples, and saves them to a file: listpair_train.data. data_dict[qid] = [(metrics, positive_mask, negative_mask, weight),…] metrics, positive_mask and negative_mask are padding as tensors with length of top_n
- Parameters:
top_n – The number of top-ranked documents to use for generating the list-pairs.
- Returns:
Saves the generated list-pair training data into a file.
- get_listpairs(div_query, context, top_n)[source]¶
Generates list-pair samples
- Parameters:
div_query – The query object that contains the list of ranked documents.
context – A list of previously considered documents in the context.
top_n – The number of top-ranked documents to consider.
- Returns:
A list of generated samples, each containing metrics, positive/negative masks, and weights.
- class fairdiverse.search.utils.div_type.div_query(qid, query, subtopic_id_list, subtopic_list)[source]¶
Bases:
object
- add_docs(doc_list)[source]¶
Adds a list of documents to the query and initializes subtopic relevance tracking.
- Parameters:
doc_list – List of document identifiers.
- add_docs_rel_score(doc_score_list)[source]¶
Adds relevance scores for the documents associated with the query.
- Parameters:
doc_score_list – List of relevance scores for documents.
- add_query_suggestion(query_suggestion)[source]¶
Adds query suggestions related to the main query.
- Parameters:
query_suggestion – Suggested query string.
- get_alpha_DCG(docs_rank, print_flag=False)[source]¶
Computes the alpha-DCG for the input document list (for generating training samples)
- Parameters:
docs_rank – A list of document IDs representing the ranking order.
print_flag – A boolean flag indicating whether to print intermediate computation results.
- Returns:
The computed alpha-DCG score for the given document ranking.
- get_best_rank(top_n=None, alpha=0.5)[source]¶
Generates the best document ranking using a greedy selection strategy.
- Parameters:
top_n – The number of top documents to be selected (default: all available documents).
alpha – A parameter controlling redundancy reduction (default: 0.5).
- Returns:
Updates class attributes with the best document ranking and associated gains.
- get_test_alpha_nDCG(docs_rank)[source]¶
Get the alpha_nDCG@20 for the input document list (for testing).
- Parameters:
docs_rank – Ordered list of document identifiers.
- Returns:
Alpha-nDCG score for the given ranking.