Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions

About

Retrieval-Augmented Generation (RAG) has emerged as a key paradigm for enhancing large language models (LLMs) by incorporating external knowledge. However, current RAG methods face two limitations: (1) they only cover limited RAG scenarios. (2) They suffer from limited task diversity due to the lack of a general RAG dataset. To address these limitations, we propose RAG-Instruct, a general method for synthesizing diverse and high-quality RAG instruction data based on any source corpus. Our approach leverages (1) five RAG paradigms, which encompass diverse query-document relationships, and (2) instruction simulation, which enhances instruction diversity and quality by utilizing the strengths of existing instruction datasets. Using this method, we construct a 40K instruction dataset from Wikipedia, comprehensively covering diverse RAG scenarios and tasks. Experiments demonstrate that RAG-Instruct effectively enhances LLMs' RAG capabilities, achieving strong zero-shot performance and significantly outperforming various RAG baselines across a diverse set of tasks. RAG-Instruct is publicly available at https://github.com/FreedomIntelligence/RAG-Instruct.

Wanlong Liu, Junying Chen, Ke Ji, Li Zhou, Wenyu Chen, Benyou Wang• 2024

Related benchmarks

TaskDatasetResultRank
Question AnsweringARC
Accuracy85.1
230
Question AnsweringHotpotQA
F162.9
128
Question AnsweringTQA
Accuracy83.8
74
Question AnsweringPopQA
Accuracy (Acc)70.4
26
Question AnsweringPub
Accuracy82.8
22
Showing 5 of 5 rows

Other info

Follow for update