Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GroupRank: A Groupwise Paradigm for Effective and Efficient Passage Reranking with LLMs

About

Large Language Models (LLMs) have emerged as powerful tools for passage reranking in information retrieval, leveraging their superior reasoning capabilities to address the limitations of conventional models on complex queries. However, current LLM-based reranking paradigms are fundamentally constrained by an efficiency-accuracy trade-off: (1) pointwise methods are efficient but ignore inter-document comparison, yielding suboptimal accuracy; (2) listwise methods capture global context but suffer from context-window constraints and prohibitive inference latency. To address these issues, we propose GroupRank, a novel paradigm that balances flexibility and context awareness. To unlock the full potential of groupwise reranking, we propose an answer-free data synthesis pipeline that fuses local pointwise signals with global listwise rankings. These samples facilitate supervised fine-tuning and reinforcement learning, with the latter guided by a specialized group-ranking reward comprising ranking-utility and group-alignment. These complementary components synergistically optimize document ordering and score calibration to reflect intrinsic query-document relevance. Experimental results show GroupRank achieves a state-of-the-art 65.2 NDCG@10 on BRIGHT and surpasses baselines by 2.1 points on R2MED, while delivering a 6.4$\times$ inference speedup.

Meixiu Long, Duolin Sun, Dan Yang, Yihan Jiao, Lei Liu, Jiahai Wang, BinBin Hu, Yue Shen, Jie Feng, Zhehao Tan, Junjie Wang, Lianzhen Zhong, Jian Wang, Peng Wei, Jinjie Gu• 2025

Related benchmarks

TaskDatasetResultRank
RetrievalHotpotQA
R@590.6
36
Document RerankingBEIR
Average NDCG@1055.1
12
Passage RerankingBRIGHT
NDCG@10 (Avg)38
12
RerankingR2MED (test)
Average Score52.3
12
RetrievalMuSiQue
Recall@565.08
10
RetrievalDetectiveQA
Recall@329.34
8
RetrievalNarrativeQA
Recall@323.98
8
RetrievalOverall (Musique, HotpotQA, NarrativeQA, DetectiveQA)
Avg Recall@347.82
8
Retrieval and RerankingLoCoMo (test)
Recall@377.99
5
Showing 9 of 9 rows

Other info

GitHub

Follow for update