Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection
About
We propose a deep neural network fusion architecture for fast and robust pedestrian detection. The proposed network fusion architecture allows for parallel processing of multiple networks for speed. A single shot deep convolutional network is trained as a object detector to generate all possible pedestrian candidates of different sizes and occlusions. This network outputs a large variety of pedestrian candidates to cover the majority of ground-truth pedestrians while also introducing a large number of false positives. Next, multiple deep neural networks are used in parallel for further refinement of these pedestrian candidates. We introduce a soft-rejection based network fusion method to fuse the soft metrics from all networks together to generate the final confidence scores. Our method performs better than existing state-of-the-arts, especially when detecting small-size and occluded pedestrians. Furthermore, we propose a method for integrating pixel-wise semantic segmentation network into the network fusion architecture as a reinforcement to the pedestrian detector. The approach outperforms state-of-the-art methods on most protocols on Caltech Pedestrian dataset, with significant boosts on several protocols. It is also faster than all other methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Pedestrian Detection | Caltech | MR8.65 | 17 | |
| Pedestrian Detection | Caltech standard (test) | Detection Rate (Reasonable)8.18 | 11 | |
| Pedestrian Detection | Caltech reasonable setting (test) | Miss Rate8.18 | 9 | |
| Pedestrian Detection | Caltech Pedestrian (test) | Reasonable L-AMR8.18 | 8 | |
| Pedestrian Detection | Caltech Pedestrian | Latency (s/image)0.16 | 8 |