Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework

About

As large language models (LLMs) have become the norm in NLP, demonstrating good performance in generation and reasoning tasks, one of its most fatal disadvantages is the lack of factual correctness. Generating unfactual texts not only leads to lower performances but also degrades the trust and validity of their applications. Chain-of-Thought (CoT) prompting improves trust and model performance on complex reasoning tasks by generating interpretable reasoning chains, but still suffers from factuality concerns in knowledge-intensive tasks. In this paper, we propose the Verify-and-Edit framework for CoT prompting, which seeks to increase prediction factuality by post-editing reasoning chains according to external knowledge. Building on top of GPT-3, our framework lead to accuracy improvements in multiple open-domain question-answering tasks.

Ruochen Zhao, Xingxuan Li, Shafiq Joty, Chengwei Qin, Lidong Bing• 2023

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	EM39	559
Multi-hop Question Answering	HotpotQA (test)	F129.64	311
Multi-hop Question Answering	HotpotQA	F1 Score70.16	294
Multi-hop Question Answering	2WikiMultiHopQA (test)	EM37.2	226
Multi-hop Question Answering	MuSiQue	EM22	209
Multi-hop Question Answering	MuSiQue (test)	F16.5	128
Fact Verification	FEVER	Accuracy53.9	72
Long-form Question Answering	ELI5	ROUGE-L23.8	57
Multi-hop Question Answering	StrategyQA (test)	Accuracy63.07	26
Question Answering	15 Domain-Specific Knowledge Tasks (test)	FiQA Accuracy76.3	24

Showing 10 of 15 rows

Other info

Code

Follow for update

@wizwand_team Discord