ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models

About

Recent Vision-Language-Action (VLA) models have shown impressive flexibility and generalization, yet their deployment in robotic manipulation remains limited by heavy computational overhead and inference latency. In this work, we present ActDistill, a general action-guided self-derived distillation framework that transfers the action prediction capability of any existing VLA model to a lightweight counterpart. Unlike previous efficiency strategies that primarily emphasize vision-language correlations, ActDistill leverages action priors to guide knowledge transfer and model compression, achieving action-oriented efficiency for VLA models. Specifically, we employ a well-trained VLA model as the teacher and introduce a graph-structured encapsulation strategy to explicitly model the hierarchical evolution of action prediction. The student model, derived from the graph-encapsulated teacher, is further equipped with a dynamic router that adaptively selects computation paths based on action prediction demands, guided by hierarchical graph-informed supervision to ensure smooth and efficient evolution. During inference, graph-related auxiliary components are removed, allowing the student to execute only dynamically routed layers and predict high-precision actions with minimal computation and latency. Experiments on embodied benchmarks demonstrate that ActDistill achieves comparable or superior performance to full-scale VLA models while reducing computation by over 50% with up to 1.67 times speedup, thereby establishing a general paradigm toward efficient embodied intelligence.

Wencheng Ye, Tianshi Wang, Lei Zhu, Fengling Li, Guoli Yang, Hengtao Shen• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	SimplerEnv Google Robot tasks Variant Aggregation	Average Success Rate61.78	88
Robot Manipulation	LIBERO	Spatial Success Rate81.8	46
Robotic Manipulation	SIMPLER Visual Matching	Average Success Rate74.08	31
Robotic Manipulation	ARX5 Real-World	Task 1 Success Rate80	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord