Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
About
In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Person Re-Identification | MARS v1 (test) | mAP88.2 | 41 | |
| Video-based Person Re-identification | iLIDS-VID v1 (test) | Rank-1 Accuracy95.3 | 18 | |
| Video-based Person Re-identification | DanceVReID v1 (test) | mAP52.5 | 14 | |
| Video-based Person Re-identification | SportsVReID v1 (test) | mAP74.4 | 13 | |
| Person Re-Identification | DetReIDX A→G v1 | mAP41.63 | 11 | |
| Person Re-Identification | DetReIDX G→A v1 | mAP26.26 | 11 | |
| Person Re-Identification | DetReIDX A→A v1 | mAP13.83 | 11 | |
| Vehicle Re-identification | DetReIDX V1 | mAP-328.11 | 9 | |
| Vehicle Re-identification | VReID-XFD A→A | Rank-1 (R1)15.96 | 6 | |
| Vehicle Re-identification | VReID-XFD A→G | Rank-1 Accuracy28.96 | 6 |