RAFT: Adapting Language Model to Domain Specific RAG

About

Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain such new knowledge remains an open question. In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings. In RAFT, given a question, and a set of retrieved documents, we train the model to ignore those documents that don't help in answering the question, which we call, distractor documents. RAFT accomplishes this by citing verbatim the right sequence from the relevant document that would help answer the question. This coupled with RAFT's chain-of-thought-style response helps improve the model's ability to reason. In domain-specific RAG, RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG. RAFT's code and demo are open-sourced at github.com/ShishirPatil/gorilla.

Tianjun Zhang, Shishir G. Patil, Naman Jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez• 2024

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	HotpotQA	F1 Score51.6	294
Multi-hop Question Answering	2Wiki	Exact Match39.4	215
Question Answering	PubMedQA (test)	Accuracy73.3	170
Question Answering	PubMedQA	Accuracy73.3	145
Retrieval-Augmented Generation	LOFT	NQ Score70	42
Retrieval-Augmented Generation	ICR2	NQ Score59	37
Multi-hop Question Answering	MuSiQue	Exact Match (EM)13.8	31
Retrieval-Augmented Question Answering	RAGQA Leaderboard (test)	AVG Score81.6	29
Requirements Engineering	EDA requirements engineering	F1 Score (Ref)67.31	21
Retrieval	MisstepMath	Cosine Similarity61.36	16

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord