Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chain-of-Verification Reduces Hallucination in Large Language Models

About

Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method whereby the model first (i) drafts an initial response; then (ii) plans verification questions to fact-check its draft; (iii) answers those questions independently so the answers are not biased by other responses; and (iv) generates its final verified response. In experiments, we show CoVe decreases hallucinations across a variety of tasks, from list-based questions from Wikidata, closed book MultiSpanQA and longform text generation.

Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy93.6
1362
Commonsense ReasoningCSQA
Accuracy86
366
Mathematical ReasoningMathQA
Accuracy84
305
Mathematical ReasoningAIME
AIME Accuracy45
288
Question AnsweringGPQA
Accuracy52
258
Mathematical ReasoningAMC 23
Accuracy72.5
198
Mathematical ReasoningMATH L5
Accuracy0.56
90
Scientific ReasoningGPQA
Accuracy65.4
75
Troop placement predictionRisk
EMD0.56
66
Question AnsweringSQuAD (test)
GPT Judge Accuracy58
45
Showing 10 of 34 rows

Other info

Follow for update