Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Recursive Think-Answer Process for LLMs and VLMs

About

Think-Answer reasoners such as DeepSeek-R1 have made notable progress by leveraging interpretable internal reasoning. However, despite the frequent presence of self-reflective cues like "Oops!", they remain vulnerable to output errors during single-pass inference. To address this limitation, we propose an efficient Recursive Think-Answer Process (R-TAP) that enables models to engage in iterative reasoning cycles and generate more accurate answers, going beyond conventional single-pass approaches. Central to this approach is a confidence generator that evaluates the certainty of model responses and guides subsequent improvements. By incorporating two complementary rewards-Recursively Confidence Increase Reward and Final Answer Confidence Reward-we show that R-TAP-enhanced models consistently outperform conventional single-pass methods for both large language models (LLMs) and vision-language models (VLMs). Moreover, by analyzing the frequency of "Oops"-like expressions in model responses, we find that R-TAP-applied models exhibit significantly fewer self-reflective patterns, resulting in more stable and faster inference-time reasoning. We hope R-TAP pave the way evolving into efficient and elaborated methods to refine the reasoning processes of future AI.

Byung-Kwan Lee, Youngchae Chee, Yong Man Ro• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical Multimodal ReasoningMathVerse
Accuracy61.8
259
Mathematical Multimodal ReasoningMathVista
Accuracy80.2
258
Multimodal Math ReasoningMathVision
Accuracy39.9
246
Mathematical ReasoningMinerva Math
Accuracy43.8
233
Multimodal Math ReasoningWeMath
Accuracy79.3
211
Mathematical ReasoningAIME 2024 (test)
Accuracy28.3
209
MathematicsMATH 500
Pass@197.3
122
Reading ComprehensionDROP
F1 Score84.5
96
Mathematical ReasoningMATH500
Accuracy83.5
82
Mathematical ReasoningOlympiadBench
Accuracy0.538
72
Showing 10 of 31 rows

Other info

Follow for update