Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PLIP: Language-Image Pre-training for Person Representation Learning

About

Language-image pre-training is an effective technique for learning powerful representations in general domains. However, when directly turning to person representation learning, these general pre-training methods suffer from unsatisfactory performance. The reason is that they neglect critical person-related characteristics, i.e., fine-grained attributes and identities. To address this issue, we propose a novel language-image pre-training framework for person representation learning, termed PLIP. Specifically, we elaborately design three pretext tasks: 1) Text-guided Image Colorization, aims to establish the correspondence between the person-related image regions and the fine-grained color-part textual phrases. 2) Image-guided Attributes Prediction, aims to mine fine-grained attribute information of the person body in the image; and 3) Identity-based Vision-Language Contrast, aims to correlate the cross-modal representations at the identity level rather than the instance level. Moreover, to implement our pre-train framework, we construct a large-scale person dataset with image-text pairs named SYNTH-PEDES by automatically generating textual annotations. We pre-train PLIP on SYNTH-PEDES and evaluate our models by spanning downstream person-centric tasks. PLIP not only significantly improves existing methods on all these tasks, but also shows great ability in the zero-shot and domain generalization settings. The code, dataset and weights will be released at~\url{https://github.com/Zplusdragon/PLIP}

Jialong Zuo, Jiahao Hong, Feng Zhang, Changqian Yu, Hanyu Zhou, Changxin Gao, Nong Sang, Jingdong Wang• 2023

Related benchmarks

TaskDatasetResultRank
Person Re-IdentificationMarket1501 (test)
Rank-1 Accuracy97.3
1264
Person Re-IdentificationMSMT17 (test)
Rank-1 Acc85.3
499
Person Re-IdentificationMarket-1501 (test)
Rank-197.3
384
Text-to-image Person Re-identificationCUHK-PEDES (test)
Rank-1 Accuracy (R-1)75.36
150
Person SearchCUHK-SYSU (test)
CMC Top-10.975
147
Person SearchPRW (test)
mAP57.8
129
Human ParsingLIP (val)
mIoU63.52
111
Person Re-IdentificationDukeMTMC (test)
mAP84.4
83
Human Part ParsingPASCAL-Person-Part (test)
mIoU73.93
68
Text-to-image Person Re-identificationCUHK-PEDES
Rank-175.36
34
Showing 10 of 20 rows

Other info

Code

Follow for update