Learning Transferable Pedestrian Representation from Multimodal Information Supervision
About
Recent researches on unsupervised person re-identification~(reID) have demonstrated that pre-training on unlabeled person images achieves superior performance on downstream reID tasks than pre-training on ImageNet. However, those pre-trained methods are specifically designed for reID and suffer flexible adaption to other pedestrian analysis tasks. In this paper, we propose VAL-PAT, a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information. To train our framework, we introduce three learning objectives, \emph{i.e.,} self-supervised contrastive learning, image-text contrastive learning and multi-attribute classification. The self-supervised contrastive learning facilitates the learning of the intrinsic pedestrian properties, while the image-text contrastive learning guides the model to focus on the appearance information of pedestrians.Meanwhile, multi-attribute classification encourages the model to recognize attributes to excavate fine-grained pedestrian information. We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations, and then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search. Extensive experiments demonstrate that our framework facilitates the learning of general pedestrian representations and thus leads to promising results on various pedestrian analysis tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Person Re-Identification | Duke MTMC-reID (test) | Rank-191.5 | 1018 | |
| Person Re-Identification | MSMT17 (test) | Rank-1 Acc67.5 | 499 | |
| Person Re-Identification | Occluded-Duke (test) | Rank-1 Acc82.9 | 177 | |
| Person Re-Identification | DukeMTMC (test) | mAP74.9 | 83 | |
| Text-based Person Search | CUHK-PEDES | Recall@164.7 | 61 | |
| Pedestrian Attribute Recognition | PA-100K (test) | mA82.3 | 40 |