div_type¶

class fairdiverse.search.utils.div_type.div_dataset(config)[source]¶

Bases: object

get_listpair_train_data(top_n=50)[source]¶

Generates list-pair training samples using the top N relevant documents. This function processes the best document ranks for each query, generates list-pair samples, and saves them to a file: listpair_train.data. data_dict[qid] = [(metrics, positive_mask, negative_mask, weight),…] metrics, positive_mask and negative_mask are padding as tensors with length of top_n

Parameters:: top_n – The number of top-ranked documents to use for generating the list-pairs.
Returns:: Saves the generated list-pair training data into a file.

get_listpairs(div_query, context, top_n)[source]¶

Generates list-pair samples

Parameters:

div_query – The query object that contains the list of ranked documents.
context – A list of previously considered documents in the context.
top_n – The number of top-ranked documents to consider.

Returns:

A list of generated samples, each containing metrics, positive/negative masks, and weights.

class fairdiverse.search.utils.div_type.div_query(qid, query, subtopic_id_list, subtopic_list)[source]¶

Bases: object

add_docs(doc_list)[source]¶

Adds a list of documents to the query and initializes subtopic relevance tracking.

Parameters:: doc_list – List of document identifiers.

add_docs_rel_score(doc_score_list)[source]¶

Adds relevance scores for the documents associated with the query.

Parameters:: doc_score_list – List of relevance scores for documents.

add_query_suggestion(query_suggestion)[source]¶

Adds query suggestions related to the main query.

Parameters:: query_suggestion – Suggested query string.

get_alpha_DCG(docs_rank, print_flag=False)[source]¶

Computes the alpha-DCG for the input document list (for generating training samples)

Parameters:

docs_rank – A list of document IDs representing the ranking order.
print_flag – A boolean flag indicating whether to print intermediate computation results.

Returns:

The computed alpha-DCG score for the given document ranking.

get_best_rank(top_n=None, alpha=0.5)[source]¶

Generates the best document ranking using a greedy selection strategy.

Parameters:

top_n – The number of top documents to be selected (default: all available documents).
alpha – A parameter controlling redundancy reduction (default: 0.5).

Returns:

Updates class attributes with the best document ranking and associated gains.

get_test_alpha_nDCG(docs_rank)[source]¶

Get the alpha_nDCG@20 for the input document list (for testing).

Parameters:: docs_rank – Ordered list of document identifiers.
Returns:: Alpha-nDCG score for the given ranking.

set_std_metric(m)[source]¶

Sets the standard alpha-DCG metric for normalization.

Parameters:: m – Standard alpha-DCG metric value.

class fairdiverse.search.utils.div_type.subtopic(subtopic_id, subtopic)[source]¶: Bases: object