Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MogaNet: Multi-order Gated Aggregation Network

About

By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on multi-order game-theoretic interaction within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59% FLOPs and 17M parameters, respectively. The source code is available at https://github.com/Westlake-AI/MogaNet.

Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di Wu, Zhiyuan Chen, Jiangbin Zheng, Stan Z. Li• 2022

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU54
3069
Object DetectionCOCO 2017 (val)
AP48.7
2843
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy83.4
2238
Instance SegmentationCOCO 2017 (val)
APm0.488
1275
ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy (%)87.8
1171
Semantic segmentationADE20K
mIoU49.2
1028
Image ClassificationImageNet-1k (val)
Top-1 Accuracy84.7
920
Image ClassificationImageNet-1k (val)
Top-1 Accuracy84.3
708
2D Human Pose EstimationCOCO 2017 (val)
AP77.3
386
Image ClassificationImageNet-1k 1.0 (test)
Top-1 Accuracy0.847
191
Showing 10 of 28 rows

Other info

Code

Follow for update