Cumulative Reasoning with Large Language Models

About

Recent advancements in large language models (LLMs) have shown remarkable progress, yet their ability to solve complex problems remains limited. In this work, we introduce Cumulative Reasoning (CR), a structured framework that enhances LLM problem-solving by emulating human-like iterative and cumulative thought processes. CR orchestrates LLMs in three distinct roles: Proposer, Verifier(s), and Reporter, to systematically decompose tasks, generate and validate intermediate reasoning steps, and compose them into a solution by building a dynamic Directed Acyclic Graph (DAG) of verified propositions. This approach substantially enhances problem-solving capabilities. We demonstrate CR's advantage through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over previous methods. In solving MATH problems, CR achieves a 4.2% increase from previous methods and a 43% relative improvement in the most challenging level 5 problems. When incorporating a code environment with CR, we further harness LLMs' reasoning capabilities and outperform the Program of Thought (PoT) method by 38.8%.

Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao• 2023

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH500 (test)	Accuracy58	895
Mathematical Reasoning	MATH (test)	Overall Accuracy90.09	433
Mathematical Reasoning	AIME	AIME Accuracy62.17	288
Logical reasoning	LogiQA (test)	Accuracy45.25	151
Logical reasoning	LogiQA	Accuracy45.25	98
Logical reasoning	FOLIO (test)	Accuracy69.11	58
Logical reasoning	ProntoQA (test)	Accuracy98.2	57
Logical reasoning	ProofWriter (test)	Accuracy71.67	57
Mathematical Reasoning	Game of 24 (test)	Accuracy98	35
Web navigation	Webshop	--	32

Showing 10 of 33 rows

Other info

Code

Follow for update

@wizwand_team Discord