Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

X-Distill: Cross-Architecture Vision Distillation for Visuomotor Learning

About

Visuomotor policies often leverage large pre-trained Vision Transformers (ViTs) for their powerful generalization capabilities. However, their significant data requirements present a major challenge in the data-scarce context of most robotic learning settings, where compact CNNs with strong inductive biases can be more easily optimized. To address this trade-off, we introduce X-Distill, a simple yet highly effective method that synergizes the strengths of both architectures. Our approach involves an offline, cross-architecture knowledge distillation, transferring the rich visual representations of a large, frozen DINOv2 teacher to a compact ResNet-18 student on the general-purpose ImageNet dataset. This distilled encoder, now endowed with powerful visual priors, is then jointly fine-tuned with a diffusion policy head on the target manipulation tasks. Extensive experiments on $34$ simulated benchmarks and $5$ challenging real-world tasks demonstrate that our method consistently outperforms policies equipped with from-scratch ResNet or fine-tuned DINOv2 encoders. Notably, X-Distill also surpasses 3D encoders that utilize privileged point cloud observations or much larger Vision-Language Models. Our work highlights the efficacy of a simple, well-founded distillation strategy for achieving state-of-the-art performance in data-efficient robotic manipulation.

Maanping Shao, Feihong Zhang, Gu Zhang, Baiye Cheng, Zhengrong Xue, Huazhe Xu• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationMetaWorld 50 tasks
Success Rate (Easy)93.9
21
Robot ManipulationAdroit
Success Rate68.3
18
Robot ManipulationDexArt
Success Rate63.5
14
Robot ManipulationMetaWorld, Adroit, and Dexart Combined
Average Success Rate87.2
6
Door CloseDoor Close X-Arm 6 Tabletop (Out-of-Distribution)
Success Rate100
4
Drawer-OpenDrawer Open X-Arm 6 Tabletop (OOD)
Success Rate53.3
4
Move BrushMove Brush X-Arm 6 Tabletop (In-Distribution)
Success Rate75
4
Move BrushMove Brush Out-of-Distribution X-Arm 6 Tabletop
Success Rate25
4
Move CubeMove Cube X-Arm 6 Tabletop (In-Distribution)
Success Rate93.3
4
Move CubeMove Cube X-Arm 6 Tabletop OOD
Success Rate40
4
Showing 10 of 15 rows

Other info

Follow for update