Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cumulative Reasoning with Large Language Models

About

Recent advancements in large language models (LLMs) have shown remarkable progress, yet their ability to solve complex problems remains limited. In this work, we introduce Cumulative Reasoning (CR), a structured framework that enhances LLM problem-solving by emulating human-like iterative and cumulative thought processes. CR orchestrates LLMs in three distinct roles: Proposer, Verifier(s), and Reporter, to systematically decompose tasks, generate and validate intermediate reasoning steps, and compose them into a solution by building a dynamic Directed Acyclic Graph (DAG) of verified propositions. This approach substantially enhances problem-solving capabilities. We demonstrate CR's advantage through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over previous methods. In solving MATH problems, CR achieves a 4.2% increase from previous methods and a 43% relative improvement in the most challenging level 5 problems. When incorporating a code environment with CR, we further harness LLMs' reasoning capabilities and outperform the Program of Thought (PoT) method by 38.8%. Project Page: https://github.com/iiis-ai/cumulative-reasoning.

Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH (test)
Overall Accuracy90.09
433
Mathematical ReasoningMATH500 (test)
Accuracy58
381
Mathematical ReasoningAIME
AIME Accuracy62.17
283
Logical reasoningLogiQA
Accuracy45.25
98
Logical reasoningLogiQA (test)
Accuracy45.25
92
Logical reasoningFOLIO (test)
Accuracy69.11
58
Logical reasoningProntoQA (test)
Accuracy98.2
36
Logical reasoningProofWriter (test)
Accuracy71.67
36
Mathematical ReasoningGame of 24 (test)
Accuracy98
35
General Mathematics ReasoningMath-G College-math, Math-OAI, Minerva-math (test)
Accuracy54.1
24
Showing 10 of 26 rows

Other info

Code

Follow for update