Minimizing the Pretraining Gap: Domain-aligned Text-Based Person Retrieval
About
In this work, we focus on text-based person retrieval, which identifies individuals based on textual descriptions. Despite advancements enabled by synthetic data for pretraining, a significant domain gap, due to variations in lighting, color, and viewpoint, limits the effectiveness of the pretrain-finetune paradigm. To overcome this issue, we propose a unified pipeline incorporating domain adaptation at both image and region levels. Our method features two key components: Domain-aware Diffusion (DaD) for image-level adaptation, which aligns image distributions between synthetic and real-world domains, e.g., CUHK-PEDES, and Multi-granularity Relation Alignment (MRA) for region-level adaptation, which aligns visual regions with descriptive sentences, thereby addressing disparities at a finer granularity. This dual-level strategy effectively bridges the domain gap, achieving state-of-the-art performance on CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets. The dataset, model, and code are available at https://github.com/Shuyu-XJTU/MRA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Person Re-Identification | Market-1501 (test) | Rank-194.39 | 397 | |
| Text-based Person Search | RSTPReid (test) | R@168.15 | 114 | |
| Text-based Person Retrieval | ICFG-PEDES (test) | R@168.93 | 30 | |
| Text-based Person Anomaly Search | PAB (test) | R@170.53 | 23 | |
| Person Re-Identification | CUHK-PEDES (source) to ICFG-PEDES (target) (test) | Rank-1 (R1)50.01 | 6 | |
| Person Re-Identification | ICFG-PEDES (source) to CUHK-PEDES (target) (test) | R147.66 | 6 |