Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ZeroDiff: Solidified Visual-Semantic Correlation in Zero-Shot Learning

About

Zero-shot Learning (ZSL) aims to enable classifiers to identify unseen classes. This is typically achieved by generating visual features for unseen classes based on learned visual-semantic correlations from seen classes. However, most current generative approaches heavily rely on having a sufficient number of samples from seen classes. Our study reveals that a scarcity of seen class samples results in a marked decrease in performance across many generative ZSL techniques. We argue, quantify, and empirically demonstrate that this decline is largely attributable to spurious visual-semantic correlations. To address this issue, we introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations. ZeroDiff comprises three key components: (1) Diffusion augmentation, which naturally transforms limited data into an expanded set of noised data to mitigate generative model overfitting; (2) Supervised-contrastive (SC)-based representations that dynamically characterize each limited sample to support visual feature generation; and (3) Multiple feature discriminators employing a Wasserstein-distance-based mutual learning approach, evaluating generated features from various perspectives, including pre-defined semantics, SC-based representations, and the diffusion process. Extensive experiments on three popular ZSL benchmarks demonstrate that ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Our codes are available at https://github.com/FouriYe/ZeroDiff_ICLR25.

Zihan Ye, Shreyank N. Gowda, Xiaowei Huang, Haotian Xu, Yaochu Jin, Kaizhu Huang, Xiaobo Jin• 2024

Related benchmarks

TaskDatasetResultRank
Generalized Zero-Shot LearningCUB
H Score81.6
250
Generalized Zero-Shot LearningSUN
H59.8
184
Generalized Zero-Shot LearningAWA2
S Score89.3
165
Zero-shot LearningCUB
Top-1 Accuracy87.5
144
Zero-shot LearningSUN
Top-1 Accuracy77.3
114
Zero-shot LearningAWA2
Top-1 Accuracy0.873
95
Image ClassificationCUB
Unseen Top-1 Acc80
89
Image ClassificationAWA2 GZSL
Acc (Unseen)74.7
32
Image ClassificationSUN GZSL
Top-1 Acc (Unseen)63
14
Image ClassificationAWA2 ZSL
Top-1 Acc87.3
11
Showing 10 of 11 rows

Other info

Follow for update