Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

About

Zero-shot learning (ZSL) recognizes the unseen classes by conducting visual-semantic interactions to transfer semantic knowledge from seen classes to unseen ones, supported by semantic information (e.g., attributes). However, existing ZSL methods simply extract visual features using a pre-trained network backbone (i.e., CNN or ViT), which fail to learn matched visual-semantic correspondences for representing semantic-related visual features as lacking of the guidance of semantic information, resulting in undesirable visual-semantic interactions. To tackle this issue, we propose a progressive semantic-guided vision transformer for zero-shot learning (dubbed ZSLViT). ZSLViT mainly considers two properties in the whole network: i) discover the semantic-related visual representations explicitly, and ii) discard the semantic-unrelated visual information. Specifically, we first introduce semantic-embedded token learning to improve the visual-semantic correspondences via semantic enhancement and discover the semantic-related visual tokens explicitly with semantic-guided token attention. Then, we fuse low semantic-visual correspondence visual tokens to discard the semantic-unrelated visual information for visual enhancement. These two operations are integrated into various encoders to progressively learn semantic-related visual representations for accurate visual-semantic interactions in ZSL. The extensive experiments show that our ZSLViT achieves significant performance gains on three popular benchmark datasets, i.e., CUB, SUN, and AWA2. Codes are available at: https://github.com/shiming-chen/ZSLViT .

Shiming Chen, Wenjin Hou, Salman Khan, Fahad Shahbaz Khan• 2024

Related benchmarks

TaskDatasetResultRank
Generalized Zero-Shot LearningCUB
H Score73.6
250
Generalized Zero-Shot LearningSUN
H47.3
184
Generalized Zero-Shot LearningAWA2
S Score84.6
165
Zero-shot LearningCUB
Top-1 Accuracy78.9
144
Zero-shot LearningSUN
Top-1 Accuracy68.3
114
Zero-shot LearningAWA2
Top-1 Accuracy0.707
95
Image ClassificationCUB
Unseen Top-1 Acc69.4
89
Zero-shot Image ClassificationAWA2 (test)
Metric U66.1
46
Zero-shot Image ClassificationCUB
U Score69.4
34
Image ClassificationAWA2 GZSL
Acc (Unseen)66.1
32
Showing 10 of 16 rows

Other info

Code

Follow for update