Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

About

Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG). In this work, we propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. In particular, the instruction-tuned LLMs work surprisingly well by adding a small fraction of ranking data into the training blend, and outperform existing expert ranking models, including the same LLM exclusively fine-tuned on a large amount of ranking data. For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks. Specifically, our Llama3-RankRAG significantly outperforms Llama3-ChatQA-1.5 and GPT-4 models on nine knowledge-intensive benchmarks. In addition, it also performs comparably to GPT-4 on five RAG benchmarks in the biomedical domain without instruction fine-tuning on biomedical data, demonstrating its superb capability for generalization to new domains.

Yue Yu, Wei Ping, Zihan Liu, Boxin Wang, Jiaxuan You, Chao Zhang, Mohammad Shoeybi, Bryan Catanzaro• 2024

Related benchmarks

TaskDatasetResultRank
Question AnsweringARC Challenge
Accuracy70.6
749
Multi-hop Question Answering2WikiMultihopQA
EM38.2
278
Question AnsweringOBQA
Accuracy87.5
276
Multi-hop Question AnsweringHotpotQA
F1 Score63.6
221
Question AnsweringPopQA
Accuracy66.1
186
Question AnsweringPubMedQA
Accuracy79.8
145
Question AnsweringTriviaQA
Accuracy92.3
85
Question Answering2Wiki
F160
75
Question AnsweringARC-C
Accuracy0.696
68
Fact VerificationFEVER
Accuracy0.938
67
Showing 10 of 20 rows

Other info

Follow for update