EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

About

Vision-Language-Action (VLA) models, particularly diffusion-based architectures, demonstrate transformative potential for embodied intelligence but are severely hampered by high computational and memory demands stemming from extensive inherent and inference-time redundancies. While existing acceleration efforts often target isolated inefficiencies, such piecemeal solutions typically fail to holistically address the varied computational and memory bottlenecks across the entire VLA pipeline, thereby limiting practical deployability. We introduce EfficientVLA, a structured and training-free inference acceleration framework that systematically eliminates these barriers by cohesively exploiting multifaceted redundancies. EfficientVLA synergistically integrates three targeted strategies: (1) pruning of functionally inconsequential layers from the language module, guided by an analysis of inter-layer redundancies; (2) optimizing the visual processing pathway through a task-aware strategy that selects a compact, diverse set of visual tokens, balancing task-criticality with informational coverage; and (3) alleviating temporal computational redundancy within the iterative diffusion-based action head by strategically caching and reusing key intermediate features. We apply our method to a standard VLA model CogACT, yielding a 1.93X inference speedup and reduces FLOPs to 28.9%, with only a 0.6% success rate drop in the SIMPLER benchmark.

Yantai Yang, Yuhao Wang, Zichen Wen, Luo Zhongwei, Chang Zou, Zhipeng Zhang, Chuan Wen, Linfeng Zhang• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement91.1	1025
Robotic Manipulation	LIBERO	Long-horizon Success Rate72.1	165
Robot Manipulation	SimplerEnv Google Robot tasks Variant Aggregation	Average Success Rate63.18	109
Robot Manipulation	LIBERO	Spatial Success96.5	90
Robotic Manipulation	SIMPLER Google Robot VA	Pick Up Coke Can Success Rate94.4	35
Robotic Manipulation	SIMPLER Visual Matching	Average Success Rate76.13	31
Vision-Language-Action	LIBERO	Success Rate (Spatial)96.5	28
Robot Manipulation	LIBERO Spatial Object Goal Long	Spatial Success Score84.3	26
Multi-stage Robotic Manipulation	Kitchen (test)	Success Rate (Kit_p1)20	15
Lift	Mixed Human (MH) demonstration data	Success Rate100	9

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord