FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions
About
Differentiable Neural Architecture Search (DNAS) has demonstrated great success in designing state-of-the-art, efficient neural networks. However, DARTS-based DNAS's search space is small when compared to other search methods', since all candidate network layers must be explicitly instantiated in memory. To address this bottleneck, we propose a memory and computationally efficient DNAS variant: DMaskingNAS. This algorithm expands the search space by up to $10^{14}\times$ over conventional DNAS, supporting searches over spatial and channel dimensions that are otherwise prohibitively expensive: input resolution and number of filters. We propose a masking mechanism for feature map reuse, so that memory and computational costs stay nearly constant as the search space expands. Furthermore, we employ effective shape propagation to maximize per-FLOP or per-parameter accuracy. The searched FBNetV2s yield state-of-the-art performance when compared with all previous architectures. With up to 421$\times$ less search cost, DMaskingNAS finds models with 0.9% higher accuracy, 15% fewer FLOPs than MobileNetV3-Small; and with similar accuracy but 20% fewer FLOPs than Efficient-B0. Furthermore, our FBNetV2 outperforms MobileNetV3 by 2.6% in accuracy, with equivalent model size. FBNetV2 models are open-sourced at https://github.com/facebookresearch/mobile-vision.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet (val) | Top-1 Acc78.1 | 1206 | |
| Classification | ImageNet-1K 1.0 (val) | Top-1 Accuracy (%)77.2 | 1155 | |
| Image Classification | ImageNet 1k (test) | Top-1 Accuracy78.2 | 798 | |
| Semantic segmentation | Cityscapes | mIoU72.6 | 578 | |
| Image Classification | ImageNet | Top-1 Accuracy75.2 | 429 | |
| Image Classification | ImageNet (test) | Top-1 Accuracy77.2 | 291 | |
| Semantic segmentation | COCO Stuff | mIoU28.5 | 195 | |
| Image Classification | ImageNet (val) | Top-1 Accuracy68.3 | 188 | |
| Semantic segmentation | Pascal VOC | mIoU0.736 | 172 | |
| Image Classification | ImageNet-1K 1 (val) | Top-1 Accuracy0.76 | 119 |