Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GeoFaith: A Spatio-Temporal Dual View of Faithful Chain-of-Thought

About

Chain-of-Thought (CoT) reasoning has advanced large language models (LLMs), but outcome-based supervision leads to pervasive post-hoc rationalization, producing plausible yet unfaithful reasoning chains. Most prior faithfulness assessment methods are either unscalable, expensive, or unreliable. We propose GeoFaith, a spatio-temporal framework that leverages latent geometric structure and entropy dynamics to diagnose and enforce faithful reasoning. We develop a scalable bootstrapping pipeline expanding step-level annotations from 1k to 20k samples across four domains, train an 8B faithfulness detector outperforming GPT-5 on standard benchmarks, and design a faithfulness-aware reinforcement learning framework jointly optimizing outcome correctness, process faithfulness, and trajectory consistency. Experiments show the proposed method achieves superior performance on both faithfulness detection and downstream reasoning, producing shorter, more interpretable chains without sacrificing accuracy. Our code will be made available publicly.

Weijiang Lv, Wentong Zhao, Jiayu Wang, Yuhao Wu, Jiaheng Wei, Xiaobo Xia• 2026

Related benchmarks

TaskDatasetResultRank
Graduate-Level ReasoningGPQA D
Accuracy49.5
12
Logical reasoningLogiQA
Accuracy (LogiQA)68.9
12
Multi-hop Reasoning2WikiMultihopQA
Accuracy82.1
12
ReasoningAMC23
Accuracy95
12
ReasoningOverall
Overall Accuracy73.9
12
Faithfulness DetectionStep-level Benchmark In-domain Math
FF184.2
10
Faithfulness DetectionIn-domain Step-level Benchmark Reasoning
FF184.5
10
Faithfulness DetectionIn-domain Step-level Benchmark Knowledge
FF183.4
10
Faithfulness DetectionIn-domain Step-level Benchmark Agent
FF180.2
10
Faithfulness DetectionRAGTruth
Accuracy90.3
10
Showing 10 of 13 rows

Other info

Follow for update