Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DRFormer: A Dual-Regularized Bidirectional Transformer for Person Re-identification

About

Both fine-grained discriminative details and global semantic features can contribute to solving person re-identification challenges, such as occlusion and pose variations. Vision foundation models (\textit{e.g.}, DINO) excel at mining local textures, and vision-language models (\textit{e.g.}, CLIP) capture strong global semantic difference. Existing methods predominantly rely on a single paradigm, neglecting the potential benefits of their integration. In this paper, we analyze the complementary roles of these two architectures and propose a framework to synergize their strengths by a \textbf{D}ual-\textbf{R}egularized Bidirectional \textbf{Transformer} (\textbf{DRFormer}). The dual-regularization mechanism ensures diverse feature extraction and achieves a better balance in the contributions of the two models. Extensive experiments on five benchmarks show that our method effectively harmonizes local and global representations, achieving competitive performance against state-of-the-art methods.

Ying Shu, Pujian Zhan, Huiqi Yang, Hehe Fan, Youfang Lin, Kai Lv• 2026

Related benchmarks

TaskDatasetResultRank
Person Re-IdentificationMarket 1501
mAP92.9
999
Person Re-IdentificationMSMT17
mAP0.787
404
Person Re-IdentificationDukeMTMC
R1 Accuracy92.5
120
Person Re-IdentificationOccluded-Duke
mAP0.653
97
Person Re-IdentificationCUHK03 NP
Rank-189.6
64
Showing 5 of 5 rows

Other info

Follow for update