Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!

About

In information retrieval, proprietary large language models (LLMs) such as GPT-4 and open-source counterparts such as LLaMA and Vicuna have played a vital role in reranking. However, the gap between open-source and closed models persists, with reliance on proprietary, non-transparent models constraining reproducibility. Addressing this gap, we introduce RankZephyr, a state-of-the-art, open-source LLM for listwise zero-shot reranking. RankZephyr not only bridges the effectiveness gap with GPT-4 but in some cases surpasses the proprietary model. Our comprehensive evaluations across several datasets (TREC Deep Learning Tracks; NEWS and COVID from BEIR) showcase this ability. RankZephyr benefits from strategic training choices and is resilient against variations in initial document ordering and the number of documents reranked. Additionally, our model outperforms GPT-4 on the NovelEval test set, comprising queries and passages past its training period, which addresses concerns about data contamination. To foster further research in this rapidly evolving field, we provide all code necessary to reproduce our results at https://github.com/castorini/rank_llm.

Ronak Pradeep, Sahel Sharifymoghaddam, Jimmy Lin• 2023

Related benchmarks

TaskDatasetResultRank
Question AnsweringHotpotQA
F141
114
Document RankingTREC DL Track 2019 (test)
nDCG@1073.9
96
Question AnsweringMuSiQue
EM5.2
84
Multi-hop Question AnsweringHotpotQA
F138.8
79
Question Answering2Wiki
F130.5
75
RerankingTREC 2020 (test)
NDCG@1070.9
55
Multi-hop Question Answering2Wiki
F1 Score29.3
41
Information RetrievalScientific QA Base setting
HitRate@156.35
38
RankingBEIR selected subset v1.0.0 (test)
TREC-COVID84
38
Question AnsweringScientific QA Base setting
F1 Score44.22
38
Showing 10 of 23 rows

Other info

Follow for update