ranklib_ranker¶

class fairdiverse.search.ranker_model.ranklib_ranker.RankLib(configs, dataset)[source]¶

Bases: Ranker

Wrapper class to run the available ranking models in the Ranklib library. For more information about available models and params check the official documentation: https://sourceforge.net/p/lemur/wiki/RankLib%20How%20to%20use/

assign_judgement(x, th, cols)[source]¶

Assigns judgement scores based on relevance ranking.

This method assigns a judgement score to each document based on its relevance to a query.

:param xpandas.DataFrame: The subset of data belonging to a single query.
:param thfloat: The threshold for classifying relevance.
:param colslist: The list of feature columns.
:returnpandas.DataFrame: The data with the assigned judgement scores.

create_ranklib_data(cols, data, out_dir, split)[source]¶

Formats and writes data for RankLib.

This method prepares the data by formatting it according to RankLib’s required format and writes it to a text file.

:param colslist: The list of feature columns.
:param datapandas.DataFrame: The data to be written to a text file.
:param out_dirPath: The output directory where the file will be saved.
:param splitstr: The type of data, either “train” or “test”.

generate_ranklib_data(data_train, data_test, run)[source]¶

Generates data formatted for RankLib training and testing.

This method prepares the training and testing data for RankLib by generating the required feature matrix and label information in a format that RankLib can process.

:param data_trainpandas.DataFrame: The training dataset.
:param data_testpandas.DataFrame: The testing dataset.
:param runstr: The identifier for the current run.

predict(data, run, file_name)[source]¶

Generates predictions using the trained RankLib model.

This method reads the predictions from the trained model and saves them as a CSV file.

:param datapandas.DataFrame: The dataset on which predictions need to be made.
:param runstr: The identifier for the current run.
:param file_namestr: The file name to save the predictions as a CSV.
:returnpandas.DataFrame: A DataFrame containing the predictions.

read_predictions(data, run)[source]¶

Retrieves LTR predictions for the dataset.

This method loads the predictions from the trained RankLib model.

:param datapandas.DataFrame: The dataset for which predictions need to be made.
:param runstr: The identifier for the run.
:returnpandas.DataFrame: The dataset with predictions added.

train(data_train, data_test, run)[source]¶

Trains ranking models using RankLib.

This method generates RankLib-compatible training data and then runs the RankLib training script.

:param data_trainpandas.DataFrame: The training dataset to be used for training the ranking model.
:param data_testpandas.DataFrame: The testing dataset to be used for evaluating the ranking model.
:param runstr: The identifier for the current training run.

fairdiverse.search.ranker_model.ranklib_ranker.get_LTR_predict(data, out_dir, ranker, score_col, query_col, id_col)[source]¶

Fetches RankLib prediction scores.

This method loads prediction scores from the model and merges them with the provided dataset.

:param datapandas.DataFrame: The dataset that needs the predictions.
:param out_dirPath: The directory where the RankLib predictions are stored.
:param rankerstr: The name of the ranking model used.
:param score_colstr: The column name of the score in the dataset.
:param query_colstr: The column representing queries.
:param id_colstr: The unique identifier for each data point.
:returnpandas.DataFrame: The dataset with added prediction scores.

fairdiverse.search.ranker_model.ranklib_ranker.get_prediction_scores(pred_path)[source]¶

Retrieves prediction scores from the latest RankLib experiment.

This method reads the predictions generated from the latest experiment and returns them.

:param pred_pathPath: The directory containing the prediction files.
:returndict: A dictionary mapping document IDs to predicted scores.