CARAFE: Content-Aware ReAssembly of FEatures
About
Feature upsampling is a key operation in a number of modern convolutional network architectures, e.g. feature pyramids. Its design is critical for dense prediction tasks such as object detection and semantic/instance segmentation. In this work, we propose Content-Aware ReAssembly of FEatures (CARAFE), a universal, lightweight and highly effective operator to fulfill this goal. CARAFE has several appealing properties: (1) Large field of view. Unlike previous works (e.g. bilinear interpolation) that only exploit sub-pixel neighborhood, CARAFE can aggregate contextual information within a large receptive field. (2) Content-aware handling. Instead of using a fixed kernel for all samples (e.g. deconvolution), CARAFE enables instance-specific content-aware handling, which generates adaptive kernels on-the-fly. (3) Lightweight and fast to compute. CARAFE introduces little computational overhead and can be readily integrated into modern network architectures. We conduct comprehensive evaluations on standard benchmarks in object detection, instance/semantic segmentation and inpainting. CARAFE shows consistent and substantial gains across all the tasks (1.2%, 1.3%, 1.8%, 1.1db respectively) with negligible computational overhead. It has great potential to serve as a strong building block for future research. It has great potential to serve as a strong building block for future research. Code and models are available at https://github.com/open-mmlab/mmdetection.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ADE20K (val) | mIoU54.48 | 2731 | |
| Instance Segmentation | COCO 2017 (val) | -- | 1144 | |
| Semantic segmentation | ADE20K | mIoU51.85 | 936 | |
| Semantic segmentation | PASCAL VOC (val) | mIoU42.39 | 338 | |
| Panoptic Segmentation | COCO 2017 (val) | PQ40.8 | 172 | |
| Semantic segmentation | Pascal VOC 21 classes (val) | mIoU0.8026 | 103 | |
| Semantic segmentation | COCO Stuff-27 (val) | mIoU59.73 | 75 | |
| Depth Estimation | NYU v2 (val) | RMSE1.09 | 53 | |
| Semantic segmentation | ADE20K 150 classes (val) | mIoU38.3 | 35 | |
| Semantic segmentation | Cityscapes 27 classes (val) | mIoU56.05 | 11 |