Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RetoVLA: Reusing Register Tokens for Spatial Reasoning in Vision-Language-Action Models

About

Vision-Language-Action (VLA) models have demonstrated robust performance across diverse robotic tasks. However, their high memory and computational demands often limit real-time deployment. While existing model compression techniques reduce the parameter footprint, they often drop in 3D spatial reasoning and scene layout understanding. This work introduces RetoVLA, an architecture designed to maintain spatial awareness in lightweight models by repurposing Register Tokens-learnable parameters originally introduced to mitigate attention artifacts in Vision Transformers. While these tokens are generally discarded once used, we repurpose them for their dense representation of global spatial context. RetoVLA integrates these recycled tokens directly into the action-planning module through a dedicated spatial context injection path. Our proposed design enables the recovery of global context without increasing the total parameter count. Real-world experiments using a 7-DOF manipulator show a 17.1%p improvement in average success rates over the baseline. Our results demonstrate that leveraging internal register tokens provides a highly effective mechanism for developing efficient, spatially-aware robotic agents. A video demonstration is available at: https://youtu.be/2CseBR-snZg

Jiyeon Koo, Taewan Cho, Hyunjoon Kang, Eunseom Pyo, Tae Gyun Oh, Taeryang Kim, Andrew Jaeyong Choi• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Goal Achievement80.4
700
Pick-&-PlaceReal-World custom 7-DOF robot arm
Success Rate (SR)92
4
Build Domino LineReal-World custom 7-DOF robot arm
Success Rate (SR)40
2
Clean Marker on MirrorReal-World custom 7-DOF robot arm
Success Rate0.52
2
Close DrawerReal-World custom 7-DOF robot arm
Success Rate96
2
Move BowlReal-World custom 7-DOF robot arm
Success Rate (SR)38
2
Robot Manipulation (Overall)Real-World custom 7-DOF robot arm
Mean Success Rate (MSR)67.42
2
Stack by SizeReal-World custom 7-DOF robot arm
Success Rate76
2
Showing 8 of 8 rows

Other info

Follow for update