Test-time Recursive Thinking: Self-Improvement without External Feedback

About

Modern Large Language Models (LLMs) have shown rapid improvements in reasoning capabilities, driven largely by reinforcement learning (RL) with verifiable rewards. Here, we ask whether these LLMs can self-improve without the need for additional training. We identify two core challenges for such systems: (i) efficiently generating diverse, high-quality candidate solutions, and (ii) reliably selecting correct answers in the absence of ground-truth supervision. To address these challenges, we propose Test-time Recursive Thinking (TRT), an iterative self-improvement framework that conditions generation on rollout-specific strategies, accumulated knowledge, and self-generated verification signals. Using TRT, open-source models reach 100% accuracy on AIME-25/24, and on LiveCodeBench's most difficult problems, closed-source models improve by 10.4-14.8 percentage points without external feedback.

Yufan Zhuang, Chandan Singh, Liyuan Liu, Yelong Shen, Dinghuai Zhang, Jingbo Shang, Jianfeng Gao, Weizhu Chen• 2026

Related benchmarks

Task	Dataset	Result
Visual Grounded Reasoning	TreeBench	Overall Score48.9	153
Visual Perception and Reasoning	V*Bench	Attribute Score92.2	49
Perception	MME-RealWorld-Lite	Overall Score56.8	46
High-Resolution Multimodal Reasoning	HR-Bench-4K	Overall Score86.2	40
High-Resolution Multimodal Reasoning	HR-Bench-8K	Overall Score83.9	40
Reasoning	MME-RealWorld-Lite	OCR Score81	37
Visual Question Answering	VisualProbe Medium	Accuracy39.6	9
Visual Question Answering	VisualProbe Hard	Accuracy40.6	9
Visual Question Answering	VisualProbe (Overall)	Accuracy45.3	9
Visual Question Answering	VisualProbe Easy	Accuracy59.7	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord