RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

About

We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves large language models' reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucination. In particular, the proposed method -- *retrieval-augmented thoughts* (RAT) -- revises each thought step one by one with retrieved information relevant to the task query, the current and the past thought steps, after the initial zero-shot CoT is generated. Applying RAT to GPT-3.5, GPT-4, and CodeLLaMA-7b substantially improves their performances on various long-horizon generation tasks; on average of relatively increasing rating scores by 13.63% on code generation, 16.96% on mathematical reasoning, 19.2% on creative writing, and 42.78% on embodied task planning. The demo page can be found at https://craftjarvis.github.io/RAT

Zihao Wang, Anji Liu, Haowei Lin, Jiaqi Li, Xiaojian Ma, Yitao Liang• 2024

Related benchmarks

Task	Dataset	Result
Question Answering	2WikiMQA	--	66
Open-domain Question Answering	HotpotQA in-domain	F1 Score53.8	57
Open-domain Question Answering	MuSiQue (out-of-domain)	F129	57
Open-domain Question Answering	2WikiMultiHopQA in-domain	F1 Score45.7	57
Mathematical Reasoning	MATH	Math500 Score74.4	41
Reasoning	MMLU-Pro	History Score57.5	40
Medical Reasoning	Medicine MedQA M-Med	MedQA Score74.4	40
Open-domain QA	Bambogle v1 (out-of-domain)	F1 Score53	33
Mathematical Reasoning	Math Math500 Minerva	Math500 Score77.5	28
Open-domain Question Answering	Bamboogle (out-of-domain)	F160.3	24

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord