Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization
About
Semi-supervised learning (SSL) has emerged as a practical solution for addressing data scarcity challenges by leveraging unlabeled data. Recently, vision-language models (VLMs), pre-trained on massive image-text pairs, have demonstrated remarkable zero-/few-shot performance that often surpasses SSL approaches due to their exceptional generalization capabilities. This gap motivates us to question: how can we effectively harness the powerful generalization capabilities of VLMs into task-specific models? Knowledge distillation (KD) offers a natural framework for transferring VLM capabilities, but we identify that it suffers from gradient conflicts between supervised and distillation losses. To address this challenge, we propose Dual-Head Optimization (DHO), which introduces dual prediction heads for each distinct signal. We observe that DHO resolves gradient conflicts, enabling improved feature learning compared to single-head KD baselines, with practical benefits of minimal computational overhead and test-time hyperparameter tuning without retraining. Extensive experiments across 15 datasets show that DHO consistently outperforms KD baselines, often outperforming teacher models with smaller student models. DHO also achieves new state-of-the-art performance on both in-distribution ImageNet semi-supervised learning and out-of-distribution generalization across ImageNet variants. We publicly release our code and model checkpoints to facilitate future research at https://github.com/erjui/DHO.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet-1k (val) | -- | 1453 | |
| Image Classification | ImageNet (val) | Top-1 Acc85.9 | 1206 | |
| Image Classification | ImageNet A | Top-1 Acc64.4 | 553 | |
| Image Classification | ImageNet V2 | Top-1 Acc77.8 | 487 | |
| Image Classification | ImageNet-R | Top-1 Acc82.8 | 474 | |
| Image Classification | ImageNet-Sketch | Top-1 Accuracy61.7 | 360 | |
| Image Classification | ImageNet 1.0 (10% labeled) | Accuracy85.9 | 33 | |
| Image Classification | ImageNet 1% labeled 1.0 | Accuracy84.6 | 21 |