process_bm25

fairdiverse.search.utils.process_bm25.calculate_bm25(query, documents, k1=1.5, b=0.75)[source]

Calculates BM25 relevance scores between a query and a list of documents. This function implements the BM25 ranking algorithm, which combines term frequency, inverse document frequency, and document length normalization to score document relevance to a query.

Parameters:
  • query – A string containing the search query

  • documents – A list of strings, where each string is a document’s text

  • k1 – Float parameter controlling term frequency scaling (default: 1.5)

  • b – Float parameter controlling document length normalization (default: 0.75)

Returns:

A list of float values representing BM25 scores for each document

fairdiverse.search.utils.process_bm25.generate_bm25_scores_for_query(config)[source]

Generates BM25 relevance scores for queries and their suggested variations against documents. This function processes each query and its suggested alternatives, calculating BM25 scores against a collection of documents. The scores are computed for both the original query and its suggestions, then saved to a pickle file.

Parameters:

config – A dictionary containing configuration parameters including data directories and model settings.