Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Global-Local Temporal Representations For Video Person Re-Identification

About

This paper proposes the Global-Local Temporal Representation (GLTR) to exploit the multi-scale temporal cues in video sequences for video person Re-Identification (ReID). GLTR is constructed by first modeling the short-term temporal cues among adjacent frames, then capturing the long-term relations among inconsecutive frames. Specifically, the short-term temporal cues are modeled by parallel dilated convolutions with different temporal dilation rates to represent the motion and appearance of pedestrian. The long-term relations are captured by a temporal self-attention model to alleviate the occlusions and noises in video sequences. The short and long-term temporal cues are aggregated as the final GLTR by a simple single-stream CNN. GLTR shows substantial superiority to existing features learned with body part cues or metric learning on four widely-used video ReID datasets. For instance, it achieves Rank-1 Accuracy of 87.02% on MARS dataset without re-ranking, better than current state-of-the art.

Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, Shiliang Zhang• 2019

Related benchmarks

TaskDatasetResultRank
Video Person Re-IDMARS
Rank-1 Acc87.02
106
Video Person Re-IDiLIDS-VID
Rank-186
80
Person Re-IdentificationPRID 2011 (test)
Rank-195.5
48
Video Person Re-IdentificationMARS (test)
Rank-187
35
Video Person Re-IdentificationDukeMTMC-VideoReID
Rank-1 Accuracy96.3
26
Video Person Re-IdentificationiLIDS-VID (test)
Rank-186
25
Video Person Re-IdentificationG2A-VReID Ground to Aerial
mAP50.1
25
Video Person Re-IdentificationPRID 2011
Rank-1 Accuracy95.5
23
Video Person Re-IdentificationMARS v1 (test)
mAP85.8
21
Video Person Re-IdentificationMarket-1501 v1 (test)
Rank-187
21
Showing 10 of 17 rows

Other info

Follow for update