Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

QuoVLA: Quotient Space for Vision-Language-Action Models

About

Vision-Language-Action (VLA) models commonly adapt pretrained Vision-Language Models (VLMs) to robot control by mapping visual observations and language instructions to continuous actions. Existing approaches typically take an action-insufficiency view, assuming that pretrained VLM latents either lack directly usable action information or should be shielded from action-learning signals. Against this view, our \textit{Quotient Theory for VLA} shows that pretrained VLM latents are not action-insufficient but action-sufficient: they already contain the information needed for control, yet remain overcomplete by distinguishing prompt-level variations that induce the same optimal action behavior. To operationalize this theory, we propose QuoVLA, a quotient-space framework for VLA that compresses pretrained VLM latents into action-sufficient representations. Specifically, QuoVLA instantiates this principle with a quantization module and a dual-branch design with relative temporal-complexity regularization, preserving action-relevant information while removing prompt-level redundancy. Extensive experiments across multiple benchmarks demonstrate that QuoVLA achieves strong performance, with particularly notable improvements in generalization under visual, linguistic, and environmental distribution shifts. Our code will be made publicly available.

Xuan Wang, Yinan Wu, Haoran Duan, Jungong Han• 2026

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationLIBERO 1.0 (test)
Long98.7
57
Robotic ManipulationLIBERO-Plus (test)
Language Robustness Score88.2
32
Robotic ManipulationRoboTwin Easy 2.0
Adjust Bottle Success Rate18
19
Robotic ManipulationRoboTwin Hard 2.0
Adjust Bottle Success Rate68
5
Robot ManipulationLIBERO-PRO Spatial suite (test)
Orientation Score100
4
Robot ManipulationLIBERO-PRO Goal suite (test)
Ori Success Rate100
4
Robot ManipulationLIBERO-PRO Long suite (test)
Ori Success Rate99
4
Robot ManipulationLIBERO-PRO Object suite (test)
Orientation Success100
4
Move tennis ball from yellow plate to blue plateReal Robot
Success Rate97
2
Pick red cube into yellow plateReal Robot
Success Rate74
2
Showing 10 of 13 rows

Other info

Follow for update