Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

About

CLIP-based person re-identification (ReID) methods aggregate spatial features into a single global \texttt{[CLS]} token optimized for image-text alignment rather than spatial selectivity, making representations fragile under occlusion and cross-camera variation. We propose SAGA-ReID, which reconstructs identity representations by aligning intermediate patch tokens with anchor vectors parameterized in CLIP's text embedding space -- emphasizing spatially stable evidence while suppressing corrupted or absent regions, without requiring textual descriptions of individual images. Controlled experiments isolate the aggregation mechanism under two qualitatively distinct conditions -- synthetic masking, where identity signal is absent, and realistic human distractors, where an overlapping person introduces semantically confusing signal -- with SAGA's advantage over global pooling growing substantially as occlusion increases across both conditions. Benchmark evaluations confirm consistent gains over CLIP-ReID across standard and occluded settings, with the largest improvements where global pooling is most unreliable: up to +10.6 Rank-1 on occluded benchmarks. SAGA's aggregation outperforms dedicated sequential patch aggregation on a stronger backbone, confirming that structured reconstruction addresses a bottleneck that backbone quality and architectural complexity alone cannot resolve. Code available at https://github.com/ipl-uw/Structured-Anchor-Guided-Aggregation-for-ReID.

Aotian Zheng, Winston Sun, Bahaa Alattar, Vitaly Ablavsky, Jenq-Neng Hwang• 2026

Related benchmarks

TaskDatasetResultRank
Person Re-IdentificationMSMT17
mAP0.784
546
Person Re-IdentificationDukeMTMC
R1 Accuracy92.3
206
Person Re-IdentificationMarket1501
mAP0.927
143
Person Re-IdentificationOccluded-Duke
mAP0.708
131
Person Re-IdentificationOccluded-reID
R-194.5
104
Person Re-IdentificationP-DukeMTMC
Rank-1 Acc94.4
23
Person Re-IdentificationOccluded-Market
Rank-1 Accuracy90.1
17
ClassificationDomainNet (held-out target)
Average Accuracy61
3
Showing 8 of 8 rows

Other info

Follow for update