Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

About

Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions. While most of the current methods treat the task as a holistic visual and textual feature matching one, we approach it from an attribute-aligning perspective that allows grounding specific attribute phrases to the corresponding visual regions. We achieve success as well as the performance boosting by a robust feature learning that the referred identity can be accurately bundled by multiple attribute visual cues. To be concrete, our Visual-Textual Attribute Alignment model (dubbed as ViTAA) learns to disentangle the feature space of a person into subspaces corresponding to attributes using a light auxiliary attribute segmentation computing branch. It then aligns these visual features with the textual attributes parsed from the sentences by using a novel contrastive learning loss. Upon that, we validate our ViTAA framework through extensive experiments on tasks of person search by natural language and by attribute-phrase queries, on which our system achieves state-of-the-art performances. Code will be publicly available upon publication.

Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang• 2020

Related benchmarks

TaskDatasetResultRank
Text-to-image Person Re-identificationCUHK-PEDES (test)
Rank-1 Accuracy (R-1)55.97
150
Text-based Person SearchCUHK-PEDES (test)
Rank-155.97
142
Text-based Person SearchICFG-PEDES (test)
R@150.98
104
Text-to-Image RetrievalCUHK-PEDES (test)
Recall@155.97
96
Text-to-image Person Re-identificationICFG-PEDES (test)
Rank-10.5098
81
Text-based Person SearchCUHK-PEDES
Recall@156
61
Person SearchCUHK-PEDES (test)
Recall@155.97
47
Text-to-image Person Re-identificationCUHK-PEDES
Rank-155.97
34
Text-based Person RetrievalICFG-PEDES
R@150.98
32
Text to ImageCUHK-PEDES
Rank-154.92
28
Showing 10 of 17 rows

Other info

Code

Follow for update