Environment-Aware Adaptive Pruning with Interleaved Inference Orchestration for Vision-Language-Action Models
About
While Vision-Language-Action (VLA) models hold promise in embodied intelligence, their large parameter counts lead to substantial inference latency that hinders real-time manipulation, motivating parameter sparsification. However, as the environment evolves during VLA execution, the optimal sparsity patterns change accordingly. Static pruning lacks the adaptability required for environment dynamics, whereas fixed-interval dynamic layer pruning suffers from coarse granularity and high retraining overheads. To bridge this gap, we propose EcoVLA, a training-free, plug-and-play adaptive pruning framework that supports orthogonal combination with existing VLA acceleration methods. EcoVLA comprises two components: Environment-aware Adaptive Pruning (EAP) and Interleaved Inference Orchestration ($I^2O$). EAP is a lightweight adaptive channel pruning method that incorporates the temporal consistency of the physical environment to update sparsity patterns. $I^2O$ leverages the FLOPs bubbles inherent in VLA inference to schedule the pruning method in parallel, ensuring negligible impact on latency. Evaluated on diverse VLA models and benchmarks, EcoVLA delivers state-of-the-art performance, achieving up to 1.60$\times$ speedup with only a 0.4% drop in success rate, and further reaches 2.18$\times$ speedup with only a 0.5% degradation when combined with token pruning. We further validate the effectiveness of EcoVLA on real-world robots.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | SimplerEnv Google Robot tasks Visual Matching | Pick Coke Can Success Rate95 | 62 | |
| Robot Manipulation | SimplerEnv Google Robot tasks Variant Aggregation | Pick Coke Can Success Rate86.1 | 44 | |
| Robot Manipulation | LIBERO OpenVLA-OFT | LIBERO Spatial Success0.974 | 11 | |
| Robot Task Execution | LIBERO fixed task suites π0.5 | LIBERO Spatial Success Rate98.2 | 3 | |
| Task 1: Place the apple in the basket | Kinova Gen3 Platform Real-world (test) | Latency (ms)68.4 | 2 | |
| Task 2: Put the pill bottle in the cabinet | Kinova Gen3 Platform Real-world robot evaluation (test) | Latency (ms)68.4 | 2 | |
| Task 3: Place the banana in the basket | Kinova Gen3 Platform Real-world (test) | Latency (ms)68.4 | 2 |