Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Gaze-Regularized Vision-Language-Action Models for Robotic Manipulation

About

Despite advances in Vision-Language-Action (VLA) models, robotic manipulation struggles with fine-grained tasks because current models lack mechanisms for active visual attention allocation. Human gaze naturally encodes intent, planning, and execution patterns -- offering a powerful supervisory signal for guiding robot perception. We introduce a gaze-regularized training framework that aligns VLA models' internal attention with human visual patterns without architectural modifications or inference-time overhead. Our method transforms temporally aggregated gaze heatmaps into patch-level distributions and regularizes the transformer's attention through KL divergence, creating an inductive bias toward task-relevant features while preserving deployment efficiency. When integrated into existing VLA architectures, our approach yields 4-12% improvements across manipulation benchmarks. The gaze-regularized models reach equivalent performance with fewer training steps and maintain robustness under lighting variations and sensor noise. Beyond performance metrics, the learned attention patterns produce interpretable visualizations that mirror human strategies, enhancing trust in robotic systems. Moreover, our framework requires no eye-tracking equipment and applies directly to existing datasets. These results demonstrate that human perceptual priors can significantly accelerate robot learning while improving both task performance and system interpretability.

Anupam Pani, Yanchao Yang• 2026

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationLIBERO--
314
Robot ManipulationLIBERO Object
Success Rate97.3
70
Robotic ManipulationLIBERO-10
Success Rate77.9
27
Robotic ManipulationLIBERO Spatial
Average Success Rate95.5
17
Robotic ManipulationPlace cube (MA2 problem) 1.0 (real-world)
Success Rate44
8
Robotic ManipulationAloha-Simulation Transfer Cube
Success Rate77.5
6
Robotic ManipulationAloha-Simulation Peg Insertion
Success Rate18.8
6
Robotic ManipulationAloha-Simulation Gym-Aloha Average
Success Rate48.2
6
Robotic ManipulationReal-world Pick cup and place it in container
Success Rate72
4
Robotic ManipulationReal-world Pick multiple cups and place in container
Success Rate40
4
Showing 10 of 10 rows

Other info

Follow for update