Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model

About

Vision-Language-Action (VLA) models have shown remarkable promise in robotics manipulation, yet their high computational cost hinders real-time deployment. Existing token pruning methods suffer from a fundamental trade-off: aggressive compression using pruning inevitably discards critical geometric details like contact points, leading to severe performance degradation. This forces a compromise, limiting the achievable compression rate and thus the potential speedup. We argue that breaking this trade-off requires rethinking compression as a geometry-aware, continuous token resampling in the vision encoder. To this end, we propose the Differentiable Grid Sampler (GridS), a plug-and-play module that performs task-aware, continuous resampling of visual tokens in VLA. By adaptively predicting a minimal set of salient coordinates and extracting features via differentiable interpolation, GridS preserves essential spatial information while achieving drastic compression (with fewer than 10% original visual tokens). Experiments on both LIBERO benchmark and a real robotic platform demonstrate that validating the lowest feasible visual token count reported to date, GridS achieves a 76% reduction in FLOPs with no degradation in the success rate. The code is available at https://github.com/Fediory/Grid-Sampler.

Yixu Feng, Zinan Zhao, Yanxiang Ma, Chenghao Xia, Chengbin Du, Yunke Wang, Chang Xu• 2026

Related benchmarks

TaskDatasetResultRank
Robotic task executionLIBERO
Average Success Rate97.7
44
Robot ManipulationLIBERO-Plus Zero-shot
Camera Score82.8
42
Robot ManipulationLIBERO-Spatial LIBERO-PLUS (OOD)
Success Rate (Level 1 - Easiest)92.5
4
Vision-Language-ActionLIBERO-Goal PLUS Zero-shot OOD
Success Rate (Background Textures)95
2
Bimanual ManipulationALOHA
Env. Reward2.38
2
Robot ManipulationLIBERO-10 OOD LIBERO-PLUS
Success Rate (Level 1)91
2
Robot ManipulationLIBERO-Goal (LIBERO-PLUS) Zero-shot OOD
Success Rate L193.5
2
Vision-Language-ActionLIBERO-10 PLUS OOD (test)
Success Rate (Background Textures)84.1
2
Showing 8 of 8 rows

Other info

Follow for update