X-Distill: Cross-Architecture Vision Distillation for Visuomotor Learning
About
Visuomotor policies often leverage large pre-trained Vision Transformers (ViTs) for their powerful generalization capabilities. However, their significant data requirements present a major challenge in the data-scarce context of most robotic learning settings, where compact CNNs with strong inductive biases can be more easily optimized. To address this trade-off, we introduce X-Distill, a simple yet highly effective method that synergizes the strengths of both architectures. Our approach involves an offline, cross-architecture knowledge distillation, transferring the rich visual representations of a large, frozen DINOv2 teacher to a compact ResNet-18 student on the general-purpose ImageNet dataset. This distilled encoder, now endowed with powerful visual priors, is then jointly fine-tuned with a diffusion policy head on the target manipulation tasks. Extensive experiments on $34$ simulated benchmarks and $5$ challenging real-world tasks demonstrate that our method consistently outperforms policies equipped with from-scratch ResNet or fine-tuned DINOv2 encoders. Notably, X-Distill also surpasses 3D encoders that utilize privileged point cloud observations or much larger Vision-Language Models. Our work highlights the efficacy of a simple, well-founded distillation strategy for achieving state-of-the-art performance in data-efficient robotic manipulation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | MetaWorld 50 tasks | Success Rate (Easy)93.9 | 21 | |
| Robot Manipulation | Adroit | Success Rate68.3 | 18 | |
| Robot Manipulation | DexArt | Success Rate63.5 | 14 | |
| Robot Manipulation | MetaWorld, Adroit, and Dexart Combined | Average Success Rate87.2 | 6 | |
| Door Close | Door Close X-Arm 6 Tabletop (Out-of-Distribution) | Success Rate100 | 4 | |
| Drawer-Open | Drawer Open X-Arm 6 Tabletop (OOD) | Success Rate53.3 | 4 | |
| Move Brush | Move Brush X-Arm 6 Tabletop (In-Distribution) | Success Rate75 | 4 | |
| Move Brush | Move Brush Out-of-Distribution X-Arm 6 Tabletop | Success Rate25 | 4 | |
| Move Cube | Move Cube X-Arm 6 Tabletop (In-Distribution) | Success Rate93.3 | 4 | |
| Move Cube | Move Cube X-Arm 6 Tabletop OOD | Success Rate40 | 4 |