Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

About

This paper proposes the paradigm of large convolutional kernels in designing modern Convolutional Neural Networks (ConvNets). We establish that employing a few large kernels, instead of stacking multiple smaller ones, can be a superior design strategy. Our work introduces a set of architecture design guidelines for large-kernel ConvNets that optimize their efficiency and performance. We propose the UniRepLKNet architecture, which offers systematical architecture design principles specifically crafted for large-kernel ConvNets, emphasizing their unique ability to capture extensive spatial information without deep layer stacking. This results in a model that not only surpasses its predecessors with an ImageNet accuracy of 88.0%, an ADE20K mIoU of 55.6%, and a COCO box AP of 56.4% but also demonstrates impressive scalability and performance on various modalities such as time-series forecasting, audio, point cloud, and video recognition. These results indicate the universal modeling abilities of large-kernel ConvNets with faster inference speed compared with vision transformers. Our findings reveal that large-kernel ConvNets possess larger effective receptive fields and a higher shape bias, moving away from the texture bias typical of smaller-kernel CNNs. All codes and models are publicly available at https://github.com/AILab-CVC/UniRepLKNet promoting further research and development in the community.

Yiyuan Zhang, Xiaohan Ding, Xiangyu Yue• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU48.6	3089
Image Classification	ImageNet-1k (val)	Top-1 Accuracy83.9	871
Object Detection	COCO	AP (Box)53	186

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord