A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
About
A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multiple output layers, so that receptive fields match objects of different scales. These complementary scale-specific detectors are combined to produce a strong multi-scale object detector. The unified network is learned end-to-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | KITTI (test) | -- | 35 | |
| 2D vehicle detection | KITTI (test) | AP (Easy)90.03 | 29 | |
| Pedestrian Detection | KITTI Hard (val) | AP64.08 | 12 | |
| Pedestrian Detection | KITTI Moderate (val) | AP72.26 | 12 | |
| Pedestrian Detection | KITTI Easy (val) | AP76.38 | 12 | |
| Pedestrian Detection | Caltech standard (test) | Detection Rate (Reasonable)9.95 | 11 | |
| Image Dehazing | HazeRD 25 | CIEDE200013.7952 | 11 | |
| Image Dehazing | O-HAZE 1 (test) | PSNR19.07 | 11 | |
| Image Dehazing | RESIDE SOTS 18 | PSNR17.57 | 11 | |
| Pedestrian Detection | Caltech reasonable setting (test) | Miss Rate9.95 | 9 |