Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification

About

The pre-training task is indispensable for the text-to-image person re-identification (T2I-ReID) task. However, there are two underlying inconsistencies between these two tasks that may impact the performance; i) Data inconsistency. A large domain gap exists between the generic images/texts used in public pre-trained models and the specific person data in the T2I-ReID task. This gap is especially severe for texts, as general textual data are usually unable to describe specific people in fine-grained detail. ii) Training inconsistency. The processes of pre-training of images and texts are independent, despite cross-modality learning being critical to T2I-ReID. To address the above issues, we present a new unified pre-training pipeline (UniPT) designed specifically for the T2I-ReID task. We first build a large-scale text-labeled person dataset "LUPerson-T", in which pseudo-textual descriptions of images are automatically generated by the CLIP paradigm using a divide-conquer-combine strategy. Benefiting from this dataset, we then utilize a simple vision-and-language pre-training framework to explicitly align the feature space of the image and text modalities during pre-training. In this way, the pre-training task and the T2I-ReID task are made consistent with each other on both data and training levels. Without the need for any bells and whistles, our UniPT achieves competitive Rank-1 accuracy of, ie, 68.50%, 60.09%, and 51.85% on CUHK-PEDES, ICFG-PEDES and RSTPReid, respectively. Both the LUPerson-T dataset and code are available at https;//github.com/ZhiyinShao-H/UniPT.

Zhiyin Shao, Xinyu Zhang, Changxing Ding, Jian Wang, Jingdong Wang• 2023

Related benchmarks

Task	Dataset	Result
Text-based Person Search	CUHK-PEDES (test)	Rank-168.5	171
Text-to-image Person Re-identification	CUHK-PEDES (test)	Rank-1 Accuracy (R-1)68.5	150
Text-based Person Search	RSTPReid (test)	R@151.85	136
Text-based Person Search	ICFG-PEDES (test)	R@160.09	109
Text-based Person Search	CUHK-PEDES	Recall@168.5	90
Text-based Person Retrieval	ICFG-PEDES	R@160.09	76
Text-to-image person retrieval	RSTPReid	Rank-1 Accuracy51.85	66
Text-based Person Re-identification	RSTPReid	Rank-1 Accuracy22.4	57
Text-based Person Search	ICFG-PEDES	R@160.09	47
Text-to-image person retrieval	CUHK-PEDES	R@168.5	28

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord