Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis

About

LLM reasoning traces suffer from complex flaws -- *Step Internal Flaws* (logical errors, hallucinations, etc.) and *Step-wise Flaws* (overthinking, underthinking), which vary by sample. A natural approach would be to provide ground-truth labels to guide LLMs' reasoning. Contrary to intuition, we show that this yields no improvement in reasoning ability. We then propose CRAFT, a unified framework that mitigates both types of Step flaws, which builds a Reasoning Knowledge Graph (RKG) based on the consensus parts of multiple candidate traces, and synthesizes a high-quality trace through topological generation. Our approach improves label-prediction accuracy by 10+% on average, and consistently outperforms all baselines across both logical and mathematical reasoning benchmarks. Further, detailed benchmark evaluation proves that our method also improves the quality of LLMs' reasoning traces in multiple dimensions.

Zipeng Ling, Shuliang Liu, Shenghong Fu, Yuehao Tang, Seonil Son, Yao Wan, Xuming Hu• 2026

Related benchmarks

TaskDatasetResultRank
Logical reasoningFLD
Accuracy75.6
20
Mathematical ReasoningOlympiadBench
Accuracy (%)73.8
20
Mathematical ReasoningGSM8K
Accuracy (%)98
20
Reasoning trace quality evaluationCosmosQA
Grammar Score2.1
2
Reasoning trace quality evaluationDROP
Grammar2.8
2
Reasoning trace quality evaluationesnli
Grammar Score5.9
2
Showing 6 of 6 rows

Other info

Follow for update