InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

About

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved a new record 65.4 mAP on COCO test-dev and 62.9 mIoU on ADE20K, outperforming current leading CNNs and ViTs. The code will be released at https://github.com/OpenGVLab/InternImage.

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao• 2022

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU53.9	3069
Object Detection	COCO 2017 (val)	AP64.2	2843
Image Classification	ImageNet-1K 1.0 (val)	Top-1 Accuracy84.9	2238
Image Classification	ImageNet-1k (val)	Top-1 Accuracy89.2	1498
Instance Segmentation	COCO 2017 (val)	APm0.488	1275
Semantic segmentation	Cityscapes (test)	mIoU86.1	1252
Object Detection	COCO (test-dev)	mAP64.3	1239
Image Classification	ImageNet-1K	Top-1 Acc84.9	1239
Semantic segmentation	ADE20K	mIoU62.9	1028
Object Detection	PASCAL VOC 2007 (test)	--	844

Showing 10 of 58 rows

Other info

Code

Follow for update

@wizwand_team Discord