Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unified Multi-Dataset Training for TBPS

About

Text-Based Person Search (TBPS) has seen significant progress with vision-language models (VLMs), yet it remains constrained by limited training data and the fact that VLMs are not inherently pre-trained for pedestrian-centric recognition. Existing TBPS methods therefore rely on dataset-centric fine-tuning to handle distribution shift, resulting in multiple independently trained models for different datasets. While synthetic data can increase the scale needed to fine-tune VLMs, it does not eliminate dataset-specific adaptation. This motivates a fundamental question: can we train a single unified TBPS model across multiple datasets? We show that naive joint training over all datasets remains sub-optimal because current training paradigms do not scale to a large number of unique person identities and are vulnerable to noisy image-text pairs. To address these challenges, we propose Scale-TBPS with two contributions: (i) a noise-aware unified dataset curation strategy that cohesively merges diverse TBPS datasets; and (ii) a scalable discriminative identity learning framework that remains effective under a large number of unique identities. Extensive experiments on CUHK-PEDES, ICFG-PEDES, RSTPReid, IIITD-20K, and UFine6926 demonstrate that a single Scale-TBPS model outperforms dataset-centric optimized models and naive joint training.

Nilanjana Chatterjee, Sidharatha Garg, A V Subramanyam, Brejesh Lall• 2026

Related benchmarks

TaskDatasetResultRank
Text-based Person SearchCUHK-PEDES (test)
Rank-177.91
142
Text-based Person SearchICFG-PEDES (test)
R@168.24
104
Text-based Person SearchRSTPReid (test)
R@171.7
85
Text-based Person RetrievalUFine6926 (test)
R@154.1
11
Text-based Person SearchIIITD-20K (test)
R@10.852
5
Showing 5 of 5 rows

Other info

Follow for update