$ShiftwiseConv:$ Small Convolutional Kernel with Large Kernel Effect

About

Large kernels make standard convolutional neural networks (CNNs) great again over transformer architectures in various vision tasks. Nonetheless, recent studies meticulously designed around increasing kernel size have shown diminishing returns or stagnation in performance. Thus, the hidden factors of large kernel convolution that affect model performance remain unexplored. In this paper, we reveal that the key hidden factors of large kernels can be summarized as two separate components: extracting features at a certain granularity and fusing features by multiple pathways. To this end, we leverage the multi-path long-distance sparse dependency relationship to enhance feature utilization via the proposed Shiftwise (SW) convolution operator with a pure CNN architecture. In a wide range of vision tasks such as classification, segmentation, and detection, SW surpasses state-of-the-art transformers and CNN architectures, including SLaK and UniRepLKNet. More importantly, our experiments demonstrate that $3 \times 3$ convolutions can replace large convolutions in existing large kernel CNNs to achieve comparable effects, which may inspire follow-up works. Code and all the models at https://github.com/lidc54/shift-wiseConv.

Dachong Li, Li Li, Zhuangzhuang Chen, Jianqiang Li• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	--	2989
Image Classification	ImageNet-1K 1.0 (val)	Top-1 Accuracy83.9	2099
Object Detection	COCO (val)	--	633
Instance Segmentation	COCO (val)	APmk45.67	475
Monocular 3D Object Detection	nuScenes (val)	mAP31.42	11

Showing 5 of 5 rows

Other info

Code

Follow for update

@wizwand_team Discord