Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

About

Multiview detection incorporates multiple camera views to deal with occlusions, and its central problem is multiview aggregation. Given feature map projections from multiple views onto a common ground plane, the state-of-the-art method addresses this problem via convolution, which applies the same calculation regardless of object locations. However, such translation-invariant behaviors might not be the best choice, as object features undergo various projection distortions according to their positions and cameras. In this paper, we propose a novel multiview detector, MVDeTr, that adopts a newly introduced shadow transformer to aggregate multiview information. Unlike convolutions, shadow transformer attends differently at different positions and cameras to deal with various shadow-like distortions. We propose an effective training scheme that includes a new view-coherent data augmentation method, which applies random augmentations while maintaining multiview consistency. On two multiview detection benchmarks, we report new state-of-the-art accuracy with the proposed system. Code is available at https://github.com/hou-yz/MVDeTr.

Yunzhong Hou, Liang Zheng• 2021

Related benchmarks

Task	Dataset	Result
Multiview Pedestrian Detection	WILDTRACK (test)	MODA91.5	46
Multiview Pedestrian Detection	MultiviewX (test)	MODA93.7	35
Multi-View Detection	Wildtrack	MODA91.5	32
Multi-view people detection	MultiviewX	MODA93.7	29
Multi-view Multi-person Tracking	Wildtrack	MOTA89.4	27
Pedestrian Detection	Wildtrack	MODA91.5	21
Pedestrian Detection	MultiviewX	MODA93.7	21
Multi-view people detection	CVCS	MODA39.8	11
Multi-view crowd localization	SynMVCrowd	MODA35.6	9
Subject Registration	CSRD-II (test)	Position Avg Error2.41	8

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord