ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
About
The ability to perform pixel-wise semantic segmentation in real-time is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point operations and have long run-times that hinder their usability. In this paper, we propose a novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation. ENet is up to 18$\times$ faster, requires 75$\times$ less FLOPs, has 79$\times$ less parameters, and provides similar or better accuracy to existing models. We have tested it on CamVid, Cityscapes and SUN datasets and report on comparisons with existing state-of-the-art methods, and the trade-offs between accuracy and processing time of a network. We present performance measurements of the proposed architecture on embedded systems and suggest possible software improvements that could make ENet even faster.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (test) | mIoU80.4 | 1145 | |
| Semantic segmentation | Cityscapes (val) | mIoU58.3 | 572 | |
| Semantic segmentation | CamVid (test) | mIoU68.3 | 411 | |
| Semantic segmentation | Cityscapes (val) | mIoU58.3 | 332 | |
| Semantic segmentation | Cityscapes (val) | mIoU58.3 | 287 | |
| Lane Detection | TuSimple (test) | Accuracy93.02 | 250 | |
| Semantic segmentation | SUN RGB-D (test) | mIoU0.197 | 191 | |
| Semantic segmentation | Cityscapes (val) | mIoU58.3 | 108 | |
| Semantic segmentation | Trans10K v2 (test) | mIoU8.5 | 104 | |
| Semantic segmentation | Mapillary Vistas (val) | mIoU47 | 72 |