Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
About
Generalized zero-shot learning aims to recognize both seen and unseen classes with the help of semantic information that is shared among different classes. It inevitably requires consistent visual-semantic alignment. Existing approaches fine-tune the visual backbone by seen-class data to obtain semantic-related visual features, which may cause overfitting on seen classes with a limited number of training images. This paper proposes a novel visual and semantic prompt collaboration framework, which utilizes prompt tuning techniques for efficient feature adaptation. Specifically, we design a visual prompt to integrate the visual information for discriminative feature learning and a semantic prompt to integrate the semantic formation for visualsemantic alignment. To achieve effective prompt information integration, we further design a weak prompt fusion mechanism for the shallow layers and a strong prompt fusion mechanism for the deep layers in the network. Through the collaboration of visual and semantic prompts, we can obtain discriminative semantic-related features for generalized zero-shot image recognition. Extensive experiments demonstrate that our framework consistently achieves favorable performance in both conventional zero-shot learning and generalized zero-shot learning benchmarks compared to other state-of-the-art methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Zero-shot Image Classification | AWA2 (test) | Metric U71.8 | 46 | |
| Zero-shot Image Classification | CUB | U Score72.8 | 34 | |
| Image Classification | SUN Attribute (test) | U Score59.4 | 19 | |
| Image Classification | AWA2 v1 (test) | Score U71.8 | 19 | |
| Zero-shot Classification | SUN | U Score59.4 | 14 |