Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models

About

The complex reasoning ability of Large Language Models (LLMs) poses a critical bottleneck for their practical applications. Test-time expansion methods such as Tree-of-Thought (ToT) and Graph-of-Thought (GoT) enhance reasoning by introducing intermediate reasoning structures, tree search, or graph-based exploration mechanisms. However, their reasoning strategies suffer from limited diversity, redundant search branches, and inadequate integration and error correction across heterogeneous reasoning paths. To address these limitations, we propose a novel reasoning framework called Multi-chain Graph Refinement & Selection (MGRS), which first generates multiple diverse reasoning trajectories for a given problem, refines candidate responses using a composite self- and cross-verification strategy, then constructs a reasoning relation graph and estimates the success rate of intermediate nodes, and finally computes cumulative success rates to select the most reliable answer and corresponding reasoning trajectory. Experimental results demonstrate that MGRS significantly advances both the reasoning capability and computational efficiency of reasoning enhancement methods. Across six benchmark datasets spanning four distinct tasks, MGRS achieves an average accuracy of 82.9%, outperforming state-of-the-art baselines by a clear margin of 2.1%. Remarkably, on the 24-point game, MGRS attains 100% accuracy for the first time, while delivering a 13.6x speed-up compared to the leading Forest of Thoughts framework.

Yujiao Yang, Jing Lian, Linhui Li• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy96.5
797
Multi-hop Question AnsweringHotpotQA N=1,000 (test)
F1 Score83.5
23
Arithmetic ReasoningGame of 24 95 (test)
Success Rate100
9
Knowledge-intensive reasoningMMLU-CF first 1,000 samples (test)
Exact Match Accuracy74.2
7
Logical reasoningBBH multiple-choice (first 1,000 samples)
Exact Match Accuracy86.2
7
Mathematical ReasoningMATH first 1,000 samples (test)
Exact Match Accuracy86.8
7
Multi-hop ReasoningLongBench MuSiQue and WikiMultiHopQA
F1 Score69.9
7
Showing 7 of 7 rows

Other info

Follow for update