Towards Better Chain-of-Thought: A Reflection on Effectiveness and Faithfulness

About

Chain-of-thought (CoT) prompting demonstrates varying performance under different reasoning tasks. Previous work attempts to evaluate it but falls short in providing an in-depth analysis of patterns that influence the CoT. In this paper, we study the CoT performance from the perspective of effectiveness and faithfulness. For the former, we identify key factors that influence CoT effectiveness on performance improvement, including problem difficulty, information gain, and information flow. For the latter, we interpret the unfaithful CoT issue by conducting a joint analysis of the information interaction among the question, CoT, and answer. The result demonstrates that, when the LLM predicts answers, it can recall correct information missing in the CoT from the question, leading to the problem. Finally, we propose a novel algorithm to mitigate this issue, in which we recall extra information from the question to enhance the CoT generation and evaluate CoTs based on their information gain. Extensive experiments demonstrate that our approach enhances both the faithfulness and effectiveness of CoT.

Jiachun Li, Pengfei Cao, Yubo Chen, Jiexin Xu, Huaijun Li, Xiaojian Jiang, Kang Liu, Jun Zhao• 2024

Related benchmarks

Task	Dataset	Result
Reasoning	PrOntoQA	Acc95	14
Reasoning	ProofWriter	Accuracy65	14
CoT faithfulness detection	Truthful QA	Accuracy47.8	12
CoT faithfulness detection	AQUA	Accuracy (CoT Faithfulness)41.6	12
CoT faithfulness detection	HLE Bio	Accuracy52.5	11
CoT faithfulness detection	Logic-QA	Accuracy51.7	11

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord