process_bm25¶
- fairdiverse.search.utils.process_bm25.calculate_bm25(query, documents, k1=1.5, b=0.75)[source]¶
Calculates BM25 relevance scores between a query and a list of documents. This function implements the BM25 ranking algorithm, which combines term frequency, inverse document frequency, and document length normalization to score document relevance to a query.
- Parameters:
query – A string containing the search query
documents – A list of strings, where each string is a document’s text
k1 – Float parameter controlling term frequency scaling (default: 1.5)
b – Float parameter controlling document length normalization (default: 0.75)
- Returns:
A list of float values representing BM25 scores for each document
- fairdiverse.search.utils.process_bm25.generate_bm25_scores_for_query(config)[source]¶
Generates BM25 relevance scores for queries and their suggested variations against documents. This function processes each query and its suggested alternatives, calculating BM25 scores against a collection of documents. The scores are computed for both the original query and its suggestions, then saved to a pickle file.
- Parameters:
config – A dictionary containing configuration parameters including data directories and model settings.