Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Minimizing the Pretraining Gap: Domain-aligned Text-Based Person Retrieval

About

In this work, we focus on text-based person retrieval, which identifies individuals based on textual descriptions. Despite advancements enabled by synthetic data for pretraining, a significant domain gap, due to variations in lighting, color, and viewpoint, limits the effectiveness of the pretrain-finetune paradigm. To overcome this issue, we propose a unified pipeline incorporating domain adaptation at both image and region levels. Our method features two key components: Domain-aware Diffusion (DaD) for image-level adaptation, which aligns image distributions between synthetic and real-world domains, e.g., CUHK-PEDES, and Multi-granularity Relation Alignment (MRA) for region-level adaptation, which aligns visual regions with descriptive sentences, thereby addressing disparities at a finer granularity. This dual-level strategy effectively bridges the domain gap, achieving state-of-the-art performance on CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets. The dataset, model, and code are available at https://github.com/Shuyu-XJTU/MRA.

Shuyu Yang, Yaxiong Wang, Yongrui Li, Li Zhu, Zhedong Zheng• 2025

Related benchmarks

TaskDatasetResultRank
Person Re-IdentificationMarket-1501 (test)
Rank-194.39
397
Text-based Person SearchRSTPReid (test)
R@168.15
114
Text-based Person RetrievalICFG-PEDES (test)
R@168.93
30
Text-based Person Anomaly SearchPAB (test)
R@170.53
23
Person Re-IdentificationCUHK-PEDES (source) to ICFG-PEDES (target) (test)
Rank-1 (R1)50.01
6
Person Re-IdentificationICFG-PEDES (source) to CUHK-PEDES (target) (test)
R147.66
6
Showing 6 of 6 rows

Other info

Follow for update