Minimizing the Pretraining Gap: Domain-aligned Text-Based Person Retrieval

About

In this work, we focus on text-based person retrieval, which identifies individuals based on textual descriptions. Despite advancements enabled by synthetic data for pretraining, a significant domain gap, due to variations in lighting, color, and viewpoint, limits the effectiveness of the pretrain-finetune paradigm. To overcome this issue, we propose a unified pipeline incorporating domain adaptation at both image and region levels. Our method features two key components: Domain-aware Diffusion (DaD) for image-level adaptation, which aligns image distributions between synthetic and real-world domains, e.g., CUHK-PEDES, and Multi-granularity Relation Alignment (MRA) for region-level adaptation, which aligns visual regions with descriptive sentences, thereby addressing disparities at a finer granularity. This dual-level strategy effectively bridges the domain gap, achieving state-of-the-art performance on CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets. The dataset, model, and code are available at https://github.com/Shuyu-XJTU/MRA.

Shuyu Yang, Yaxiong Wang, Yongrui Li, Li Zhu, Zhedong Zheng• 2025

Related benchmarks

Task	Dataset	Result
Person Re-Identification	Market-1501 (test)	Rank-194.39	417
Text-based Person Search	RSTPReid (test)	R@168.15	136
Text-based Person Retrieval	ICFG-PEDES (test)	R@168.93	30
Text-based Person Anomaly Search	PAB 1.0 (test)	R@1 Score70.53	26
Text-based Person Anomaly Search	PAB (test)	R@170.53	23
Person Re-Identification	CUHK-PEDES (source) to ICFG-PEDES (target) (test)	Rank-1 (R1)50.01	6
Person Re-Identification	ICFG-PEDES (source) to CUHK-PEDES (target) (test)	R147.66	6

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord