Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CrowdFormer: Weakly-supervised Crowd counting with Improved Generalizability

About

Convolutional neural networks (CNNs) have dominated the field of computer vision for nearly a decade due to their strong ability to learn local features. However, due to their limited receptive field, CNNs fail to model the global context. On the other hand, transformer, an attention-based architecture can model the global context easily. Despite this, there are limited studies that investigate the effectiveness of transformers in crowd counting. In addition, the majority of the existing crowd counting methods are based on the regression of density maps which requires point-level annotation of each person present in the scene. This annotation task is laborious and also error-prone. This has led to increased focus on weakly-supervised crowd counting methods which require only the count-level annotations. In this paper, we propose a weakly-supervised method for crowd counting using a pyramid vision transformer. We have conducted extensive evaluations to validate the effectiveness of the proposed method. Our method is comparable to the state-of-the-art on the benchmark crowd datasets. More importantly, it shows remarkable generalizability.

Siddharth Singh Savner, Vivek Kanhangad• 2022

Related benchmarks

TaskDatasetResultRank
Crowd CountingUCF_CC_50
MAE229.6
63
Crowd CountingUCF-QNRF
MAE93.3
49
Crowd Countingsha
MAE62.1
5
Crowd CountingSHA to SHB cross-dataset
MAE16
5
Crowd CountingSHB
MAE8.5
5
Crowd CountingSHB to SHA cross-dataset
MAE121.6
4
Crowd CountingQNRF to SHB cross-dataset
MAE10.3
3
Crowd CountingQNRF to SHA cross-dataset
MAE70.8
3
Crowd CountingSHA to QNRF cross-dataset
MAE147.6
1
Crowd CountingSHB to QNRF cross-dataset
MAE304.4
1
Showing 10 of 10 rows

Other info

Follow for update