Cross-head Supervision for Crowd Counting with Noisy Annotations
About
Noisy annotations such as missing annotations and location shifts often exist in crowd counting datasets due to multi-scale head sizes, high occlusion, etc. These noisy annotations severely affect the model training, especially for density map-based methods. To alleviate the negative impact of noisy annotations, we propose a novel crowd counting model with one convolution head and one transformer head, in which these two heads can supervise each other in noisy areas, called Cross-Head Supervision. The resultant model, CHS-Net, can synergize different types of inductive biases for better counting. In addition, we develop a progressive cross-head supervision learning strategy to stabilize the training process and provide more reliable supervision. Extensive experimental results on ShanghaiTech and QNRF datasets demonstrate superior performance over state-of-the-art methods. Code is available at https://github.com/RaccoonDML/CHSNet.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Crowd Counting | ShanghaiTech Part A (test) | MAE59.2 | 227 | |
| Crowd Counting | ShanghaiTech Part B (test) | MAE7.1 | 191 | |
| Crowd Counting | UCF-QNRF (test) | MAE83.4 | 95 |