ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language
About
Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions. While most of the current methods treat the task as a holistic visual and textual feature matching one, we approach it from an attribute-aligning perspective that allows grounding specific attribute phrases to the corresponding visual regions. We achieve success as well as the performance boosting by a robust feature learning that the referred identity can be accurately bundled by multiple attribute visual cues. To be concrete, our Visual-Textual Attribute Alignment model (dubbed as ViTAA) learns to disentangle the feature space of a person into subspaces corresponding to attributes using a light auxiliary attribute segmentation computing branch. It then aligns these visual features with the textual attributes parsed from the sentences by using a novel contrastive learning loss. Upon that, we validate our ViTAA framework through extensive experiments on tasks of person search by natural language and by attribute-phrase queries, on which our system achieves state-of-the-art performances. Code will be publicly available upon publication.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-image Person Re-identification | CUHK-PEDES (test) | Rank-1 Accuracy (R-1)55.97 | 150 | |
| Text-based Person Search | CUHK-PEDES (test) | Rank-155.97 | 142 | |
| Text-based Person Search | ICFG-PEDES (test) | R@150.98 | 104 | |
| Text-to-Image Retrieval | CUHK-PEDES (test) | Recall@155.97 | 96 | |
| Text-to-image Person Re-identification | ICFG-PEDES (test) | Rank-10.5098 | 81 | |
| Text-based Person Search | CUHK-PEDES | Recall@156 | 61 | |
| Person Search | CUHK-PEDES (test) | Recall@155.97 | 47 | |
| Text-to-image Person Re-identification | CUHK-PEDES | Rank-155.97 | 34 | |
| Text-based Person Retrieval | ICFG-PEDES | R@150.98 | 32 | |
| Text to Image | CUHK-PEDES | Rank-154.92 | 28 |