Active Learners as Efficient PRP Rerankers
About
Pairwise Ranking Prompting (PRP) elicits pairwise preference judgments from an LLM, which are then aggregated into a ranking, usually via classical sorting algorithms. However, judgments are noisy, order-sensitive, and sometimes intransitive, so sorting assumptions do not match the setting. Because sorting aims to recover a full permutation, truncating it to meet a call budget does not produce a dependable top-K. We thus reframe PRP reranking as active learning from noisy pairwise comparisons and show that active rankers are drop-in replacements that improve NDCG@10 per call in the call-constrained regime. Our noise-robust framework also introduces a randomized-direction oracle that uses a single LLM call per pair. This approach converts systematic position bias into zero-mean noise, enabling unbiased aggregate ranking without the cost of bidirectional calls.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reranking | TREC DL 2020 | NDCG@100.6766 | 132 | |
| Reranking | TREC DL 2019 v1 (test) | NDCG@1069.47 | 108 | |
| Reranking | TREC DL 2019 (test) | NDCG@1070.98 | 108 | |
| Document Reranking | TREC DL 2019 and 2020 (test) | NDCG@1068.25 | 108 | |
| Document Reranking | TREC DL 19 | NDCG@1070.98 | 39 | |
| Information Retrieval | TREC DL 2020 (test) | NDCG@100.6896 | 25 | |
| Reranking | BEIR (test) | Covid Score78.5 | 19 | |
| Information Retrieval | TREC DL 2019 (test) | NDCG@1069.47 | 19 | |
| End-to-end Reranking | TREC DL 2019 2020 Average | Average NDCG@1068.92 | 10 |