Recursive Think-Answer Process for LLMs and VLMs

About

Think-Answer reasoners such as DeepSeek-R1 have made notable progress by leveraging interpretable internal reasoning. However, despite the frequent presence of self-reflective cues like "Oops!", they remain vulnerable to output errors during single-pass inference. To address this limitation, we propose an efficient Recursive Think-Answer Process (R-TAP) that enables models to engage in iterative reasoning cycles and generate more accurate answers, going beyond conventional single-pass approaches. Central to this approach is a confidence generator that evaluates the certainty of model responses and guides subsequent improvements. By incorporating two complementary rewards-Recursively Confidence Increase Reward and Final Answer Confidence Reward-we show that R-TAP-enhanced models consistently outperform conventional single-pass methods for both large language models (LLMs) and vision-language models (VLMs). Moreover, by analyzing the frequency of "Oops"-like expressions in model responses, we find that R-TAP-applied models exhibit significantly fewer self-reflective patterns, resulting in more stable and faster inference-time reasoning. We hope R-TAP pave the way evolving into efficient and elaborated methods to refine the reasoning processes of future AI.

Byung-Kwan Lee, Youngchae Chee, Yong Man Ro• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Multimodal Reasoning	MathVerse	Accuracy61.8	259
Mathematical Multimodal Reasoning	MathVista	Accuracy80.2	258
Multimodal Math Reasoning	MathVision	Accuracy39.9	246
Mathematical Reasoning	Minerva Math	Accuracy43.8	233
Multimodal Math Reasoning	WeMath	Accuracy79.3	211
Mathematical Reasoning	AIME 2024 (test)	Accuracy28.3	209
Mathematics	MATH 500	Pass@197.3	122
Reading Comprehension	DROP	F1 Score84.5	96
Mathematical Reasoning	MATH500	Accuracy83.5	82
Mathematical Reasoning	OlympiadBench	Accuracy0.538	72

Showing 10 of 31 rows

Other info

Follow for update

@wizwand_team Discord