Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval

About

In video surveillance, pedestrian retrieval (also called person re-identification) is a critical task. This task aims to retrieve the pedestrian of interest from non-overlapping cameras. Recently, transformer-based models have achieved significant progress for this task. However, these models still suffer from ignoring fine-grained, part-informed information. This paper proposes a multi-direction and multi-scale Pyramid in Transformer (PiT) to solve this problem. In transformer-based architecture, each pedestrian image is split into many patches. Then, these patches are fed to transformer layers to obtain the feature representation of this image. To explore the fine-grained information, this paper proposes to apply vertical division and horizontal division on these patches to generate different-direction human parts. These parts provide more fine-grained information. To fuse multi-scale feature representation, this paper presents a pyramid structure containing global-level information and many pieces of local-level information from different scales. The feature pyramids of all the pedestrian images from the same video are fused to form the final multi-direction and multi-scale feature representation. Experimental results on two challenging video-based benchmarks, MARS and iLIDS-VID, show the proposed PiT achieves state-of-the-art performance. Extensive ablation studies demonstrate the superiority of the proposed pyramid structure. The code is available at https://git.openi.org.cn/zangxh/PiT.git.

Xianghao Zang, Ge Li, Wei Gao• 2022

Related benchmarks

TaskDatasetResultRank
Video Person Re-IDMARS
Rank-1 Acc90.22
106
Video Person Re-IDiLIDS-VID
Rank-192.07
80
Video Person Re-IdentificationMARS v1 (test)
mAP86.8
41
Gait RecognitionCCPG
CL41
32
Video-based Person Re-identificationiLIDS-VID v1 (test)
Rank-1 Accuracy92.1
18
Person RecognitionMEVID
Rank-1 Acc34.2
18
Video-based Person Re-identificationDanceVReID v1 (test)
mAP42.3
14
Video-based Person Re-identificationSportsVReID v1 (test)
mAP70.9
13
Showing 8 of 8 rows

Other info

Code

Follow for update