DiCo: Disentangled Concept Representation for Text-to-image Person Re-identification

About

Text-to-image person re-identification (TIReID) aims to retrieve person images from a large gallery given free-form textual descriptions. TIReID is challenging due to the substantial modality gap between visual appearances and textual expressions, as well as the need to model fine-grained correspondences that distinguish individuals with similar attributes such as clothing color, texture, or outfit style. To address these issues, we propose DiCo (Disentangled Concept Representation), a novel framework that achieves hierarchical and disentangled cross-modal alignment. DiCo introduces a shared slot-based representation, where each slot acts as a part-level anchor across modalities and is further decomposed into multiple concept blocks. This design enables the disentanglement of complementary attributes (\textit{e.g.}, color, texture, shape) while maintaining consistent part-level correspondence between image and text. Extensive experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid demonstrate that our framework achieves competitive performance with state-of-the-art methods, while also enhancing interpretability through explicit slot- and block-level representations for more fine-grained retrieval results.

Giyeol Kim, Chanho Eom• 2026

Related benchmarks

Task	Dataset	Result
Text-based Person Search	CUHK-PEDES (test)	Rank-177.21	171
Text-to-image Person Re-identification	CUHK-PEDES (test)	Rank-1 Accuracy (R-1)77.21	150
Text-based Person Search	RSTPReid (test)	R@167.84	136
Text-to-Image Retrieval	CUHK-PEDES (test)	Recall@177.21	114
Text-to-image person retrieval	RSTPReid	Rank-1 Accuracy67.84	66
Text-based Person Re-identification	RSTPReid	Rank-1 Accuracy67.84	57
Text-to-image Person Re-identification	CUHK-PEDES	Rank-177.21	51
Text-based Person Re-identification	ICFG-PEDES	R@167.81	36
Text-to-image Person Re-identification	ICFG-PEDES 58 (test)	Rank-167.81	15
Text-to-image Person Re-identification	RSTPReid 59 (test)	Rank-1 Recall67.84	12

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord