SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models

About

Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robotic control, with test-time scaling (TTS) gaining attention to enhance robustness beyond training. However, existing TTS methods for VLAs require additional training, verifiers, and multiple forward passes, making them impractical for deployment. Moreover, they intervene only at action decoding while keeping visual representations fixed-insufficient under perceptual ambiguity, where reconsidering how to perceive is as important as deciding what to do. To address these limitations, we propose SCALE, a simple inference strategy that jointly modulates visual perception and action based on 'self-uncertainty', inspired by uncertainty-driven exploration in Active Inference theory-requiring no additional training, no verifier, and only a single forward pass. SCALE broadens exploration in both perception and action under high uncertainty, while focusing on exploitation when confident-enabling adaptive execution across varying conditions. Experiments on simulated and real-world benchmarks demonstrate that SCALE improves state-of-the-art VLAs and outperforms existing TTS methods while maintaining single-pass efficiency.

Hyeonbeom Choi, Daechul Ahn, Youhan Lee, Taewook Kang, Seongwon Cho, Jonghyun Choi• 2026

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement98.7	1025
Robotic Manipulation	LIBERO Spatial Object Goal Long	Overall Success Rate (Long)63.3	91
Pick-&-Place	SIMPLER-WidowX	Average Success Rate49	21
Multi-task Robot Manipulation	LIBERO-PRO-Long unseen benchmark	Language Error51.2	10
Pick-&-Place	Real-world 'Put A on B' pick-and-place (In-Distribution)	SR (Carrot/Towel)87.5	4
Pick-&-Place	Real-world 'Put A on B' pick-and-place Out-of-Distribution	SR (Teddy Bear on Bowl)50	4
Robotic Manipulation	LIBERO Small Viewpoint Shift	Success Rate0.85	2
Robotic Manipulation	LIBERO Medium Viewpoint Shift	Success Rate71.6	2
Robotic Manipulation	LIBERO Large Viewpoint Shift	Success Rate62.5	2
Robotic Manipulation	LIBERO Average Viewpoint Shift	Success Rate73	2

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord