VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

About

Human-annotated attributes serve as powerful semantic embeddings in zero-shot learning. However, their annotation process is labor-intensive and needs expert supervision. Current unsupervised semantic embeddings, i.e., word embeddings, enable knowledge transfer between classes. However, word embeddings do not always reflect visual similarities and result in inferior zero-shot performance. We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning, without requiring any human annotation. Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity, and further imposes their class discrimination and semantic relatedness. To associate these clusters with previously unseen classes, we use external knowledge, e.g., word embeddings and propose a novel class relation discovery module. Through quantitative and qualitative evaluation, we demonstrate that our model discovers semantic embeddings that model the visual properties of both seen and unseen classes. Furthermore, we demonstrate on three benchmarks that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.

Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata• 2022

Related benchmarks

Task	Dataset	Result
Generalized Zero-Shot Learning	CUB	H Score31.5	307
Generalized Zero-Shot Learning	SUN	H29.8	229
Generalized Zero-Shot Learning	AWA2	H Score63	217
Zero-shot Learning	CUB	Top-1 Accuracy28.9	183
Zero-shot Learning	AWA2	Top-1 Accuracy0.64	133
Zero-shot Learning	SUN (unseen)	Top-1 Accuracy (%)41.1	50
Zero-shot Learning	CUB (unseen)	Top-1 Accuracy35	49
Zero-shot Learning	AWA2 (unseen)	Top-1 Acc64	37
Generalized Zero-Shot Learning	AWA2 (seen unseen)	U Score51.2	10

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord