Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

About

Vision-Language-Action (VLA) models are dominant in embodied intelligence but are constrained by inference overheads. While model quantization alleviates these bottlenecks for edge deployment, static quantization approaches remain suboptimal for VLAs due to two critical challenges: (1) Temporal-dynamic sensitivity, where fixed precision wastes resources by ignoring stage-varying error tolerances; and (2) Real-time allocation, where identifying real-time sensitivity to guide bit allocation remains unsolved. To address these challenges, we propose DyQ-VLA, a dynamic quantization framework for VLAs. Specifically, a sensitivity-aware switching strategy leverages real-time kinematic proxies to trigger the bit-width switch, while a kinematic-guided module dynamically allocates the optimal bit-width. Experiments show that DyQ-VLA requires only 30.9% of the original memory footprint while maintaining 99.5% of its original performance, achieving 1.49x simulation and up to 1.43x real-world speedups.

Zihao Zheng, Hangyu Cao, Sicheng Tian, Jiayu Chen, Maoliang Li, Xinhao Sun, Hailong Zou, Zhaobo Zhang, Xuanzhe Liu, Donggang Cao, Hong Mei, Xiang Chen• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO Object
Success Rate87.8
70
Robotic ManipulationLIBERO Long
Success Rate53.4
44
Robot ManipulationLIBERO Spatial--
41
Robotic ManipulationLIBERO Goal
Success Rate78.5
21
Atomic GraspingReal-world
Success Rate (SR)86.7
4
Spatial DisplacementReal-world
SR73.3
4
Composite SequentialReal-world
SR66.7
2
Showing 7 of 7 rows

Other info

Follow for update