Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Re-Rankers as Relevance Judges

About

Using large language models (LLMs) to predict relevance judgments has shown promising results. Most studies treat this task as a distinct research line, e.g., focusing on prompt design for predicting relevance labels given a query and passage. However, predicting relevance judgments is essentially a form of relevance prediction, a problem extensively studied in tasks such as re-ranking. Despite this potential overlap, little research has explored reusing or adapting established re-ranking methods to predict relevance judgments, leading to potential resource waste and redundant development. To bridge this gap, we reproduce re-rankers in a re-ranker-as-relevance-judge setup. We design two adaptation strategies: (i) using binary tokens (e.g., "true" and "false") generated by a re-ranker as direct judgments, and (ii) converting continuous re-ranking scores into binary labels via thresholding. We perform extensive experiments on TREC-DL 2019 to 2023 with 8 re-rankers from 3 families, ranging from 220M to 32B, and analyse the evaluation bias exhibited by re-ranker-based judges. Results show that re-ranker-based relevance judges, under both strategies, can outperform UMBRELA, a state-of-the-art LLM-based relevance judge, in around 40% to 50% of the cases; they also exhibit strong self-preference towards their own and same-family re-rankers, as well as cross-family bias.

Chuan Meng, Jiqun Liu, Mohammad Aliannejadi, Fengran Mo, Jeff Dalton, Maarten de Rijke• 2026

Related benchmarks

TaskDatasetResultRank
Relevance Judgment AgreementTREC DL 2020 (test)
Cohen's Kappa0.455
9
Relevance Judgment AgreementTREC-DL 2022 (test)
Cohen's Kappa0.43
9
System ranking correlationTREC-DL 22
MAP@10092.5
9
System ranking correlationTREC DL 23
MAP@10090.6
9
Relevance Judgment AgreementTREC DL 2019 (test)
Cohen's Kappa0.463
9
Relevance Judgment AgreementTREC-DL 2021 (test)
Cohen's Kappa0.482
9
Relevance Judgment AgreementTREC-DL 2023 (test)
Cohen's Kappa0.406
9
System ranking correlationTREC-DL 2019 1 (test)
MAP@1000.91
9
System ranking correlationTREC DL 20
MAP@1000.89
9
System ranking correlationTREC-DL 21
MAP@1000.889
9
Showing 10 of 19 rows

Other info

Follow for update