div_type

class fairdiverse.search.utils.div_type.div_dataset(config)[source]

Bases: object

get_listpair_train_data(top_n=50)[source]

Generates list-pair training samples using the top N relevant documents. This function processes the best document ranks for each query, generates list-pair samples, and saves them to a file: listpair_train.data. data_dict[qid] = [(metrics, positive_mask, negative_mask, weight),…] metrics, positive_mask and negative_mask are padding as tensors with length of top_n

Parameters:

top_n – The number of top-ranked documents to use for generating the list-pairs.

Returns:

Saves the generated list-pair training data into a file.

get_listpairs(div_query, context, top_n)[source]

Generates list-pair samples

Parameters:
  • div_query – The query object that contains the list of ranked documents.

  • context – A list of previously considered documents in the context.

  • top_n – The number of top-ranked documents to consider.

Returns:

A list of generated samples, each containing metrics, positive/negative masks, and weights.

class fairdiverse.search.utils.div_type.div_query(qid, query, subtopic_id_list, subtopic_list)[source]

Bases: object

add_docs(doc_list)[source]

Adds a list of documents to the query and initializes subtopic relevance tracking.

Parameters:

doc_list – List of document identifiers.

add_docs_rel_score(doc_score_list)[source]

Adds relevance scores for the documents associated with the query.

Parameters:

doc_score_list – List of relevance scores for documents.

add_query_suggestion(query_suggestion)[source]

Adds query suggestions related to the main query.

Parameters:

query_suggestion – Suggested query string.

get_alpha_DCG(docs_rank, print_flag=False)[source]

Computes the alpha-DCG for the input document list (for generating training samples)

Parameters:
  • docs_rank – A list of document IDs representing the ranking order.

  • print_flag – A boolean flag indicating whether to print intermediate computation results.

Returns:

The computed alpha-DCG score for the given document ranking.

get_best_rank(top_n=None, alpha=0.5)[source]

Generates the best document ranking using a greedy selection strategy.

Parameters:
  • top_n – The number of top documents to be selected (default: all available documents).

  • alpha – A parameter controlling redundancy reduction (default: 0.5).

Returns:

Updates class attributes with the best document ranking and associated gains.

get_test_alpha_nDCG(docs_rank)[source]

Get the alpha_nDCG@20 for the input document list (for testing).

Parameters:

docs_rank – Ordered list of document identifiers.

Returns:

Alpha-nDCG score for the given ranking.

set_std_metric(m)[source]

Sets the standard alpha-DCG metric for normalization.

Parameters:

m – Standard alpha-DCG metric value.

class fairdiverse.search.utils.div_type.subtopic(subtopic_id, subtopic)[source]

Bases: object