Leveraging Unlabeled Data for Crowd Counting by Learning to Rank
About
We propose a novel crowd counting approach that leverages abundantly available unlabeled crowd imagery in a learning-to-rank framework. To induce a ranking of cropped images , we use the observation that any sub-image of a crowded scene image is guaranteed to contain the same number or fewer persons than the super-image. This allows us to address the problem of limited size of existing datasets for crowd counting. We collect two crowd scene datasets from Google using keyword searches and query-by-example image retrieval, respectively. We demonstrate how to efficiently learn from these unlabeled datasets by incorporating learning-to-rank in a multi-task network which simultaneously ranks images and estimates crowd density maps. Experiments on two of the most challenging crowd counting datasets show that our approach obtains state-of-the-art results.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Crowd Counting | ShanghaiTech Part B | MAE13.7 | 160 | |
| Crowd Counting | ShanghaiTech Part A | MAE72 | 138 | |
| Crowd Counting | UCF-QNRF | MAE145.1 | 48 | |
| Crowd Counting | UCF_CC_50 (5-fold cross-validation) | MAE279.6 | 43 | |
| Multi-view Crowd Counting | PETS 2009 (test) | MAE3.45 | 27 | |
| Multi-view Crowd Counting | CityStreet (test) | MAE7.12 | 27 | |
| Crowd Counting | JHU-Crowd++ | MAE87.5 | 23 | |
| Crowd Counting | UCF_CC_50 Transfer Learning | MAE337.6 | 3 |