Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification

About

Video-based person re-identification (Re-ID) aims at matching video sequences of pedestrians across non-overlapping cameras. It is a practical yet challenging task of how to embed spatial and temporal information of a video into its feature representation. While most existing methods learn the video characteristics by aggregating image-wise features and designing attention mechanisms in Neural Networks, they only explore the correlation between frames at high-level features. In this work, we target at refining the intermediate features as well as high-level features with non-local attention operations and make two contributions. (i) We propose a Non-local Video Attention Network (NVAN) to incorporate video characteristics into the representation at multiple feature levels. (ii) We further introduce a Spatially and Temporally Efficient Non-local Video Attention Network (STE-NVAN) to reduce the computation complexity by exploring spatial and temporal redundancy presented in pedestrian videos. Extensive experiments show that our NVAN outperforms state-of-the-arts by 3.8% in rank-1 accuracy on MARS dataset and confirms our STE-NVAN displays a much superior computation footprint compared to existing methods.

Chih-Ting Liu, Chih-Wei Wu, Yu-Chiang Frank Wang, Shao-Yi Chien• 2019

Related benchmarks

TaskDatasetResultRank
Video Person Re-IDMARS
Rank-1 Acc88.9
106
Person Re-IdentificationMARS (test)
Rank-188.9
72
Person Re-IdentificationMARS
Rank-190
67
Video Person Re-IdentificationDukeMTMC-VideoReID
Rank-1 Accuracy95.2
26
Video-to-Video Person Re-identificationMARS (test)
Top-1 Accuracy90
22
Video Person Re-IdentificationMarket-1501 v1 (test)
Rank-190
21
Video Person Re-IdentificationMARS v1 (test)
mAP82.3
21
Image-to-Video Person Re-identificationDukeMTMC-VideoReID (test)
Top-1 Acc95.2
16
Video-based Person Re-identificationDukeV
R196.3
15
Video-to-shop retrievalMultiDeepFashion 2 (test)
T-1 Accuracy22
13
Showing 10 of 15 rows

Other info

Code

Follow for update