Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

About

We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves large language models' reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucination. In particular, the proposed method -- *retrieval-augmented thoughts* (RAT) -- revises each thought step one by one with retrieved information relevant to the task query, the current and the past thought steps, after the initial zero-shot CoT is generated. Applying RAT to GPT-3.5, GPT-4, and CodeLLaMA-7b substantially improves their performances on various long-horizon generation tasks; on average of relatively increasing rating scores by 13.63% on code generation, 16.96% on mathematical reasoning, 19.2% on creative writing, and 42.78% on embodied task planning. The demo page can be found at https://craftjarvis.github.io/RAT

Zihao Wang, Anji Liu, Haowei Lin, Jiaqi Li, Xiaojian Ma, Yitao Liang• 2024

Related benchmarks

TaskDatasetResultRank
Question Answering2WikiMQA--
66
Open-domain Question AnsweringHotpotQA in-domain
F1 Score53.8
57
Open-domain Question AnsweringMuSiQue (out-of-domain)
F129
57
Open-domain Question Answering2WikiMultiHopQA in-domain
F1 Score45.7
57
Mathematical ReasoningMATH
Math500 Score74.4
41
ReasoningMMLU-Pro
History Score57.5
40
Medical ReasoningMedicine MedQA M-Med
MedQA Score74.4
40
Open-domain QABambogle v1 (out-of-domain)
F1 Score53
33
Mathematical ReasoningMath Math500 Minerva
Math500 Score77.5
28
Open-domain Question AnsweringBamboogle (out-of-domain)
F160.3
24
Showing 10 of 18 rows

Other info

Follow for update