Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model

About

Vision-Language-Action (VLA) models often fail to generalize to unseen camera viewpoints, a limitation stemming from their difficulty in inferring robust 3D geometry from 2D images. We introduce GeoAware-VLA, a simple yet effective approach that enhances viewpoint invariance by integrating strong geometric priors into the vision backbone. Instead of training a visual encoder or relying on explicit 3D data, we leverage a frozen, pretrained geometric vision model as a feature extractor. A lightweight, trainable projection layer then adapts these geometrically-rich features for the policy decoder, relieving it of the burden of learning 3D consistency from scratch. Through extensive evaluations on the LIBERO and CALVIN benchmarks, we show that GeoAware-VLA preserves and even improves in-distribution performance while achieving substantial gains in zero-shot generalization to unseen camera poses, improving unseen-view success rates by an average of 35 percentage points on LIBERO and over 11 percentage points on CALVIN compared to their respective baselines. Crucially, these gains transfer to the physical world, where our model shows significant improvement on a real robotic platform. Our approach proves effective across both continuous and discrete action spaces, highlighting that robust geometric grounding is a key ingredient for building more generalizable robotic agents.

Ali Abouzeid, Malak Mansour, Qinbo Sun, Zezhou Sun, Dezhen Song• 2025

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationLIBERO
Spatial Success Rate95
314
Robot ManipulationLIBERO Object--
70
Robotic ManipulationLIBERO Long--
44
Robotic ManipulationLIBERO Goal--
21
Robot ManipulationLIBERO (All four suites (combined))
Spatial Success Rate87.1
18
Robotic ManipulationLIBERO Spatial
Average Success Rate79.7
17
Robot ManipulationLIBERO-V Across Novel Camera Viewpoints (unseen)
Spatial Success Rate94.3
14
Robot ManipulationCALVIN Original View
Success Rate93
5
Robot ManipulationCALVIN (Unseen Views)
Performance Score (View 1)96.5
5
Showing 9 of 9 rows

Other info

Follow for update