Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization

About

Semi-supervised learning (SSL) has emerged as a practical solution for addressing data scarcity challenges by leveraging unlabeled data. Recently, vision-language models (VLMs), pre-trained on massive image-text pairs, have demonstrated remarkable zero-/few-shot performance that often surpasses SSL approaches due to their exceptional generalization capabilities. This gap motivates us to question: how can we effectively harness the powerful generalization capabilities of VLMs into task-specific models? Knowledge distillation (KD) offers a natural framework for transferring VLM capabilities, but we identify that it suffers from gradient conflicts between supervised and distillation losses. To address this challenge, we propose Dual-Head Optimization (DHO), which introduces dual prediction heads for each distinct signal. We observe that DHO resolves gradient conflicts, enabling improved feature learning compared to single-head KD baselines, with practical benefits of minimal computational overhead and test-time hyperparameter tuning without retraining. Extensive experiments across 15 datasets show that DHO consistently outperforms KD baselines, often outperforming teacher models with smaller student models. DHO also achieves new state-of-the-art performance on both in-distribution ImageNet semi-supervised learning and out-of-distribution generalization across ImageNet variants. We publicly release our code and model checkpoints to facilitate future research at https://github.com/erjui/DHO.

Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Sung Ju Hwang• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet-1k (val)	--	1498
Image Classification	ImageNet (val)	Top-1 Acc85.9	1206
Image Classification	ImageNet V2	Top-1 Acc77.8	749
Image Classification	ImageNet A	Top-1 Acc64.4	698
Image Classification	ImageNet-R	Top-1 Acc82.8	581
Image Classification	ImageNet-Sketch	Top-1 Accuracy61.7	473
Image Classification	ImageNet 1.0 (10% labeled)	Accuracy85.9	33
Image Classification	ImageNet 1% labeled 1.0	Accuracy84.6	21

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord