Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures

About

Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, yet their performance is highly dependent on the prompting strategy and model scale. While reinforcement learning and fine-tuning have been deployed to boost reasoning, these approaches incur substantial computational and data overhead. In this work, we introduce Adaptive Graph of Thoughts (AGoT), a dynamic, graph-based inference framework that enhances LLM reasoning solely at test time. Rather than relying on fixed-step methods like Chain of Thought (CoT) or Tree of Thoughts (ToT), AGoT recursively decomposes complex queries into structured subproblems, forming an dynamic directed acyclic graph (DAG) of interdependent reasoning steps. By selectively expanding only those subproblems that require further analysis, AGoT unifies the strengths of chain, tree, and graph paradigms into a cohesive framework that allocates computation where it is most needed. We validate our approach on diverse benchmarks spanning multi-hop retrieval, scientific reasoning, and mathematical problem-solving, achieving up to 46.2% improvement on scientific reasoning tasks (GPQA) - comparable to gains achieved through computationally intensive reinforcement learning approaches and outperforming state-of-the-art iterative approaches. These results suggest that dynamic decomposition and structured recursion offer a scalable, cost-effective alternative to post-training modifications, paving the way for more robust, general-purpose reasoning in LLMs.

Tushar Pandey, Ara Ghukasyan, Oktay Goktas, Santosh Kumar Radha• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	Game of 24	Accuracy74	147
Multi-hop Question Answering	MoreHopQA	Accuracy70	25
Multi-hop Question Answering	HotpotQA	Accuracy72	15
Graduate-level Question Answering	GPQA	Accuracy64.6	11
Multiple-Choice Reasoning	GPQA (test)	Accuracy64.6	11
Explorative Reasoning	Game of 24 (test)	Accuracy74	11
Open-ended Question Answering	HybridQA (test)	Accuracy84	11
Question Answering over Tables and Text	HybridQA	Accuracy84	11
Explorative Reasoning	Crosswords Word-level (test)	Accuracy3.5	11
Open-ended Question Answering	MoreHopQA (test)	Accuracy70	11

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord