Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Dynamic Visual-semantic Alignment for Zero-shot Learning with Ambiguous Labels

About

Zero-shot learning (ZSL) aims to recognize unseen classes without visual instances. However, existing methods usually assume clean labels, overlooking real-world label noise and ambiguity, which degrades performance. To bridge this gap, we propose the Dynamic Visual-semantic Alignment (DVSA), a robust ZSL framework for learning from ambiguous labels. DVSA uses a bidirectional visual-semantic alignment module with attention to mutually calibrate visual features and attribute prototypes, and a contrastive optimization grounded in Mutual Information (MI) at the attribute level to strengthen discriminative, semantically consistent attributes. In addition, a dynamic label disambiguation mechanism iteratively corrects noisy supervision while preserving semantic consistency, narrowing the instance-label gap, and improving generalization. Extensive experiments on standard benchmarks verify that DVSA achieves stronger performance under ambiguous supervision.

Jiangnan Li, Linqing Huang, Xiaowen Yan, Min Gan, Wenpeng Lu, Jinfu Fan• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationCUB
Harmonic Mean Top-1 Acc70.8
106
Image ClassificationAWA2 GZSL
H (Harmonic Mean)75.8
49
Image ClassificationSUN GZSL
Harmonic Mean44.8
29
Showing 3 of 3 rows

Other info

Follow for update