Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Boosting Language Models Reasoning with Chain-of-Knowledge Prompting

About

Recently, Chain-of-Thought (CoT) prompting has delivered success on complex reasoning tasks, which aims at designing a simple prompt like ``Let's think step by step'' or multiple in-context exemplars with well-designed rationales to elicit Large Language Models (LLMs) to generate intermediate reasoning steps. However, the generated rationales often come with mistakes, making unfactual and unfaithful reasoning chains. To mitigate this brittleness, we propose a novel Chain-of-Knowledge (CoK) prompting, where we aim at eliciting LLMs to generate explicit pieces of knowledge evidence in the form of structure triple. This is inspired by our human behaviors, i.e., we can draw a mind map or knowledge map as the reasoning evidence in the brain before answering a complex question. Benefiting from CoK, we additionally introduce a F^2-Verification method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness. For the unreliable response, the wrong evidence can be indicated to prompt the LLM to rethink. Extensive experiments demonstrate that our method can further improve the performance of commonsense, factual, symbolic, and arithmetic reasoning tasks.

Jianing Wang, Qiushi Sun, Xiang Li, Ming Gao• 2023

Related benchmarks

TaskDatasetResultRank
Arithmetic ReasoningMultiArith
Accuracy99.3
181
Arithmetic ReasoningGSM8K
Accuracy88.2
155
Commonsense ReasoningCommonsenseQA
Accuracy79.3
132
Common Sense ReasoningBoolQ
Accuracy69.9
131
Question AnsweringOpenBookQA (OBQA) (test)
OBQA Accuracy86.9
130
Commonsense ReasoningStrategyQA
Accuracy67.9
125
Question AnsweringMedQA (test)
Accuracy72.2
61
Mathematical ReasoningAQUA-RAT
Accuracy69.7
57
Question AnsweringCommonsenseQA IH (test)
Accuracy73.9
57
Question AnsweringCommonsenseQA IH (dev)
Accuracy75.9
53
Showing 10 of 18 rows

Other info

Follow for update