Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

About

Vision-Language-Action (VLA) Models have become the mainstream solution for robot control, but suffer from slow inference speeds. Speculative Decoding (SD) is a promising acceleration method which can be divided into two categories: drafter-based SD and retrieval-based SD. Each of the two methods demonstrates complementary advantages and limitations when applied to VLA models, leading to the hypothesis that a hybrid approach integrating these two methods will yield better performance. In this paper, we first conduct a series of detailed analyses to reveal the advantages and feasibility of hybrid utilization. However, even with the aforementioned key insights, implementing hybrid SD in VLA models presents several challenges: (1) draft rejection and persistent errors in retrieval-based SD; (2) difficulty in determining the hybrid boundary. To address these, we propose the HeiSD framework. We propose a retrieval-based SD optimization method in HeiSD, which contains a verify-skip mechanism and a sequence-wise relaxed acceptance strategy. Moreover, we proposed a kinematic-based fused metric in HeiSD to automatically determine the hybrid boundary. Experimental results demonstrate that HeiSD attains a speedup of up to 2.45x in simulation benchmarks and 2.06x~2.41x in real-world scenarios, while sustaining a high task success rate.

Zihao Zheng, Zhihao Mao, Sicheng Tian, Maoliang Li, Jiayu Chen, Xinhao Sun, Zhaobo Zhang, Xuanzhe Liu, Donggang Cao, Hong Mei, Xiang Chen• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO Object
Success Rate71
127
Robotic Manipulation SimulationLIBERO Goal
Success Rate73
5
Robotic Manipulation SimulationLIBERO Long
Success Rate47
5
Robotic Manipulation SimulationLIBERO Spatial
Success Rate (SR)78
5
Spatial DisplacementReal-world
SR75.1
4
Atomic GraspingReal-world
Success Rate (SR)86
4
Composite SequentialReal-world
Success Rate (SR)67.8
2
Showing 7 of 7 rows

Other info

Follow for update