Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention

About

Large Reasoning Models (LRMs) often reach a correct solution before their long Chain-of-Thought trace ends, yet continue with redundant verification, repeated attempts, or unnecessary exploration that wastes computation and can even overturn the correct answer. We frame this behavior as a latent productive-to-redundant transition and show that it is directly reflected in hidden states: around first-correct-solution (FCS) boundaries, late-layer representations separate efficient from overthinking tokens, while boundary-permutation and position-control baselines collapse. Based on this signal, we propose ROM, a model-agnostic streaming intervention framework that monitors frozen LRMs with a lightweight hidden-state detector and intervenes at well-formed reasoning boundaries. Counterfactual Self-Correction (CSC) augments supervision with balanced wrong to correct trajectories, preserving useful pre-FCS correction while labeling only post-FCS continuation as redundant. Across MATH500, GSM8K, AIME25, and MMLU-Pro, ROM improves the overall tradeoff on both Qwen3-8B and DeepSeek-R1-Distill-Qwen-32B (DS-32B): on Qwen3-8B, it raises accuracy from 74.47% to 74.78% and reduces response length from 4262 to 3107 tokens; on DS-32B, it raises accuracy from 68.60% to 68.72% and reduces response length from 3062 to 2319 tokens. The same FCS-derived supervision transfers across scale and training origin, suggesting a shared long-CoT boundary rather than a backbone-specific artifact. ROM is compatible with L1, removing another 20.9-21.6% tokens at zero accuracy loss. ROM also generalizes to open-ended MMLU-Pro (+1.56 pp, 35.4% shorter) and reduces wall-clock latency by 46.5%. Code is available at https://github.com/SaFo-Lab/ROM.

Xinyan Wang, Xiaogeng Liu, Chaowei Xiao• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH 500
Accuracy (Acc)87.5
543
Mathematical ReasoningMAWPS
Accuracy97.5
241
General ReasoningOverall
Accuracy93.51
24
Mathematical ReasoningMultiArith
Accuracy99.2
16
Mathematical ReasoningSVAMP
Accuracy95.8
10
Aggregate Reasoning PerformanceOverall
Accuracy93.51
8
Multiple-choice Question ReasoningMMLU-Pro
Accuracy (Acc)77.1
8
Mathematical ReasoningSVAMP
Accuracy (Acc)95.8
8
Mathematical ReasoningMATH500
Accuracy87.5
8
Mathematical ReasoningGSM8K
Accuracy100
8
Showing 10 of 13 rows

Other info

Follow for update