Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-based Person Re-identification

About

Video-based person re-identification (reID) aims at matching the same person across video clips. It is a challenging task due to the existence of redundancy among frames, newly revealed appearance, occlusion, and motion blurs. In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-aided Attentive Feature Aggregation (MG-RAFA), to delicately aggregate spatio-temporal features into a discriminative video-level feature representation. In order to determine the contribution/importance of a spatial-temporal feature node, we propose to learn the attention from a global view with convolutional operations. Specifically, we stack its relations, i.e., pairwise correlations with respect to a representative set of reference feature nodes (S-RFNs) that represents global video information, together with the feature itself to infer the attention. Moreover, to exploit the semantics of different levels, we propose to learn multi-granularity attentions based on the relations captured at different granularities. Extensive ablation studies demonstrate the effectiveness of our attentive feature aggregation module MG-RAFA. Our framework achieves the state-of-the-art performance on three benchmark datasets.

Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen• 2020

Related benchmarks

Task	Dataset	Result
Video Person Re-ID	MARS	Rank-1 Acc88.8	106
Video Person Re-ID	iLIDS-VID	Rank-188.6	80
Person Re-Identification	MARS (test)	Rank-188.8	72
Video Person Re-Identification	PRID 2011	Rank-1 Accuracy95.9	23
Person Re-Identification	iLIDS-VID (test)	Rank-1 Recall88.6	6

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord