Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Granularity-Unified Representations for Text-to-Image Person Re-identification

About

Text-to-image person re-identification (ReID) aims to search for pedestrian images of an interested identity via textual descriptions. It is challenging due to both rich intra-modal variations and significant inter-modal gaps. Existing works usually ignore the difference in feature granularity between the two modalities, i.e., the visual features are usually fine-grained while textual features are coarse, which is mainly responsible for the large inter-modal gaps. In this paper, we propose an end-to-end framework based on transformers to learn granularity-unified representations for both modalities, denoted as LGUR. LGUR framework contains two modules: a Dictionary-based Granularity Alignment (DGA) module and a Prototype-based Granularity Unification (PGU) module. In DGA, in order to align the granularities of two modalities, we introduce a Multi-modality Shared Dictionary (MSD) to reconstruct both visual and textual features. Besides, DGA has two important factors, i.e., the cross-modality guidance and the foreground-centric reconstruction, to facilitate the optimization of MSD. In PGU, we adopt a set of shared and learnable prototypes as the queries to extract diverse and semantically aligned features for both modalities in the granularity-unified feature space, which further promotes the ReID performance. Comprehensive experiments show that our LGUR consistently outperforms state-of-the-arts by large margins on both CUHK-PEDES and ICFG-PEDES datasets. Code will be released at https://github.com/ZhiyinShao-H/LGUR.

Zhiyin Shao, Xinyu Zhang, Meng Fang, Zhifeng Lin, Jian Wang, Changxing Ding• 2022

Related benchmarks

TaskDatasetResultRank
Text-to-image Person Re-identificationCUHK-PEDES (test)
Rank-1 Accuracy (R-1)65.25
150
Text-based Person SearchCUHK-PEDES (test)
Rank-165.25
142
Text-based Person SearchICFG-PEDES (test)
R@159.02
104
Text-to-Image RetrievalCUHK-PEDES (test)
Recall@165.25
96
Text-to-image Person Re-identificationICFG-PEDES (test)
Rank-10.5902
81
Text-based Person SearchCUHK-PEDES
Recall@165.25
61
Text-to-image Person Re-identificationCUHK-PEDES
Rank-164.21
34
Text-based Person RetrievalICFG-PEDES
R@159.02
32
Text to ImageCUHK-PEDES
Rank-164.21
28
Text-based Person RetrievalUFine3C (evaluation)
R@151.26
18
Showing 10 of 17 rows

Other info

Follow for update