Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization

About

Semi-supervised learning (SSL) has emerged as a practical solution for addressing data scarcity challenges by leveraging unlabeled data. Recently, vision-language models (VLMs), pre-trained on massive image-text pairs, have demonstrated remarkable zero-/few-shot performance that often surpasses SSL approaches due to their exceptional generalization capabilities. This gap motivates us to question: how can we effectively harness the powerful generalization capabilities of VLMs into task-specific models? Knowledge distillation (KD) offers a natural framework for transferring VLM capabilities, but we identify that it suffers from gradient conflicts between supervised and distillation losses. To address this challenge, we propose Dual-Head Optimization (DHO), which introduces dual prediction heads for each distinct signal. We observe that DHO resolves gradient conflicts, enabling improved feature learning compared to single-head KD baselines, with practical benefits of minimal computational overhead and test-time hyperparameter tuning without retraining. Extensive experiments across 15 datasets show that DHO consistently outperforms KD baselines, often outperforming teacher models with smaller student models. DHO also achieves new state-of-the-art performance on both in-distribution ImageNet semi-supervised learning and out-of-distribution generalization across ImageNet variants. We publicly release our code and model checkpoints to facilitate future research at https://github.com/erjui/DHO.

Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Sung Ju Hwang• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-1k (val)--
1453
Image ClassificationImageNet (val)
Top-1 Acc85.9
1206
Image ClassificationImageNet A
Top-1 Acc64.4
553
Image ClassificationImageNet V2
Top-1 Acc77.8
487
Image ClassificationImageNet-R
Top-1 Acc82.8
474
Image ClassificationImageNet-Sketch
Top-1 Accuracy61.7
360
Image ClassificationImageNet 1.0 (10% labeled)
Accuracy85.9
33
Image ClassificationImageNet 1% labeled 1.0
Accuracy84.6
21
Showing 8 of 8 rows

Other info

Code

Follow for update