ranklib_ranker

class fairdiverse.search.ranker_model.ranklib_ranker.RankLib(configs, dataset)[source]

Bases: Ranker

Wrapper class to run the available ranking models in the Ranklib library. For more information about available models and params check the official documentation: https://sourceforge.net/p/lemur/wiki/RankLib%20How%20to%20use/

assign_judgement(x, th, cols)[source]

Assigns judgement scores based on relevance ranking.

This method assigns a judgement score to each document based on its relevance to a query.

:param xpandas.DataFrame

The subset of data belonging to a single query.

:param thfloat

The threshold for classifying relevance.

:param colslist

The list of feature columns.

:returnpandas.DataFrame

The data with the assigned judgement scores.

create_ranklib_data(cols, data, out_dir, split)[source]

Formats and writes data for RankLib.

This method prepares the data by formatting it according to RankLib’s required format and writes it to a text file.

:param colslist

The list of feature columns.

:param datapandas.DataFrame

The data to be written to a text file.

:param out_dirPath

The output directory where the file will be saved.

:param splitstr

The type of data, either “train” or “test”.

generate_ranklib_data(data_train, data_test, run)[source]

Generates data formatted for RankLib training and testing.

This method prepares the training and testing data for RankLib by generating the required feature matrix and label information in a format that RankLib can process.

:param data_trainpandas.DataFrame

The training dataset.

:param data_testpandas.DataFrame

The testing dataset.

:param runstr

The identifier for the current run.

predict(data, run, file_name)[source]

Generates predictions using the trained RankLib model.

This method reads the predictions from the trained model and saves them as a CSV file.

:param datapandas.DataFrame

The dataset on which predictions need to be made.

:param runstr

The identifier for the current run.

:param file_namestr

The file name to save the predictions as a CSV.

:returnpandas.DataFrame

A DataFrame containing the predictions.

read_predictions(data, run)[source]

Retrieves LTR predictions for the dataset.

This method loads the predictions from the trained RankLib model.

:param datapandas.DataFrame

The dataset for which predictions need to be made.

:param runstr

The identifier for the run.

:returnpandas.DataFrame

The dataset with predictions added.

train(data_train, data_test, run)[source]

Trains ranking models using RankLib.

This method generates RankLib-compatible training data and then runs the RankLib training script.

:param data_trainpandas.DataFrame

The training dataset to be used for training the ranking model.

:param data_testpandas.DataFrame

The testing dataset to be used for evaluating the ranking model.

:param runstr

The identifier for the current training run.

fairdiverse.search.ranker_model.ranklib_ranker.get_LTR_predict(data, out_dir, ranker, score_col, query_col, id_col)[source]

Fetches RankLib prediction scores.

This method loads prediction scores from the model and merges them with the provided dataset.

:param datapandas.DataFrame

The dataset that needs the predictions.

:param out_dirPath

The directory where the RankLib predictions are stored.

:param rankerstr

The name of the ranking model used.

:param score_colstr

The column name of the score in the dataset.

:param query_colstr

The column representing queries.

:param id_colstr

The unique identifier for each data point.

:returnpandas.DataFrame

The dataset with added prediction scores.

fairdiverse.search.ranker_model.ranklib_ranker.get_prediction_scores(pred_path)[source]

Retrieves prediction scores from the latest RankLib experiment.

This method reads the predictions generated from the latest experiment and returns them.

:param pred_pathPath

The directory containing the prediction files.

:returndict

A dictionary mapping document IDs to predicted scores.