Dynamic Visual-semantic Alignment for Zero-shot Learning with Ambiguous Labels

About

Zero-shot learning (ZSL) aims to recognize unseen classes without visual instances. However, existing methods usually assume clean labels, overlooking real-world label noise and ambiguity, which degrades performance. To bridge this gap, we propose the Dynamic Visual-semantic Alignment (DVSA), a robust ZSL framework for learning from ambiguous labels. DVSA uses a bidirectional visual-semantic alignment module with attention to mutually calibrate visual features and attribute prototypes, and a contrastive optimization grounded in Mutual Information (MI) at the attribute level to strengthen discriminative, semantically consistent attributes. In addition, a dynamic label disambiguation mechanism iteratively corrects noisy supervision while preserving semantic consistency, narrowing the instance-label gap, and improving generalization. Extensive experiments on standard benchmarks verify that DVSA achieves stronger performance under ambiguous supervision.

Jiangnan Li, Linqing Huang, Xiaowen Yan, Min Gan, Wenpeng Lu, Jinfu Fan• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	CUB	Harmonic Mean Top-1 Acc70.8	106
Image Classification	AWA2 GZSL	H (Harmonic Mean)75.8	49
Image Classification	SUN GZSL	Harmonic Mean44.8	29

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord