DA-PTQ: Drift-Aware Post-Training Quantization for Efficient Vision-Language-Action Models

About

Vision-Language-Action models (VLAs) have demonstrated strong potential for embodied AI, yet their deployment on resource-limited robots remains challenging due to high memory and computational demands. While Post-Training Quantization (PTQ) provides an efficient solution, directly applying PTQ to VLAs often results in severe performance degradation during sequential control. We identify temporal error accumulation as a key factor, where quantization perturbations at the vision-language-to-action interface are progressively amplified, leading to kinematic drift in executed trajectories. To address this issue, we propose Drift-Aware Post-Training Quantization (DA-PTQ), which formulates quantization as a drift-aware optimization problem over sequential decision processes. DA-PTQ consists of two components: (1) Cross-Space Representation Compensation, which mitigates structured distortions between multimodal representations and action space to improve action consistency, and (2) Motion-Driven Mixed-Precision Allocation, which assigns bit-widths by minimizing trajectory-level motion errors. Extensive experiments show that DA-PTQ significantly reduces kinematic drift and achieves comparable performance to full-precision models under low-bit settings, enabling practical deployment of VLAs on resource-limited robotic platforms.

Siyuan Xu, Tianshi Wang, Fengling Li, Lei Zhu, Heng Tao Shen• 2026

Related benchmarks

Task	Dataset	Result
Robot Manipulation	SimplerEnv Google Robot tasks Variant Aggregation	Average Success Rate51.7	88
Robot Manipulation	SimplerEnv Google Robot Visual Matching	Pick Coke Can92.4	65
Robotic Manipulation	SimplerEnv WidowX In-domain Visual Matching setting	Success Rate (Spoon on Towel)65.2	4
Robotic Manipulation	SimplerEnv	Success Rate48.9	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord