Boosting Language Models Reasoning with Chain-of-Knowledge Prompting

About

Recently, Chain-of-Thought (CoT) prompting has delivered success on complex reasoning tasks, which aims at designing a simple prompt like ``Let's think step by step'' or multiple in-context exemplars with well-designed rationales to elicit Large Language Models (LLMs) to generate intermediate reasoning steps. However, the generated rationales often come with mistakes, making unfactual and unfaithful reasoning chains. To mitigate this brittleness, we propose a novel Chain-of-Knowledge (CoK) prompting, where we aim at eliciting LLMs to generate explicit pieces of knowledge evidence in the form of structure triple. This is inspired by our human behaviors, i.e., we can draw a mind map or knowledge map as the reasoning evidence in the brain before answering a complex question. Benefiting from CoK, we additionally introduce a F^2-Verification method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness. For the unreliable response, the wrong evidence can be indicated to prompt the LLM to rethink. Extensive experiments demonstrate that our method can further improve the performance of commonsense, factual, symbolic, and arithmetic reasoning tasks.

Jianing Wang, Qiushi Sun, Xiang Li, Ming Gao• 2023

Related benchmarks

Task	Dataset	Result
Arithmetic Reasoning	MultiArith	Accuracy99.3	293
Arithmetic Reasoning	GSM8K	Accuracy88.2	272
Common Sense Reasoning	BoolQ	Accuracy69.9	240
Commonsense Reasoning	ARC-C	Accuracy87.5	215
Commonsense Reasoning	StrategyQA	Accuracy67.9	208
Mathematical Reasoning	AQUA-RAT	Accuracy69.7	153
Commonsense Reasoning	CommonsenseQA	Accuracy79.3	136
Question Answering	OpenBookQA (OBQA) (test)	OBQA Accuracy86.9	130
Commonsense Reasoning	OpenBookQA	Accuracy87	108
Arithmetic Reasoning	SVAMP	Accuracy86	87

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord