Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

$ShiftwiseConv:$ Small Convolutional Kernel with Large Kernel Effect

About

Large kernels make standard convolutional neural networks (CNNs) great again over transformer architectures in various vision tasks. Nonetheless, recent studies meticulously designed around increasing kernel size have shown diminishing returns or stagnation in performance. Thus, the hidden factors of large kernel convolution that affect model performance remain unexplored. In this paper, we reveal that the key hidden factors of large kernels can be summarized as two separate components: extracting features at a certain granularity and fusing features by multiple pathways. To this end, we leverage the multi-path long-distance sparse dependency relationship to enhance feature utilization via the proposed Shiftwise (SW) convolution operator with a pure CNN architecture. In a wide range of vision tasks such as classification, segmentation, and detection, SW surpasses state-of-the-art transformers and CNN architectures, including SLaK and UniRepLKNet. More importantly, our experiments demonstrate that $3 \times 3$ convolutions can replace large convolutions in existing large kernel CNNs to achieve comparable effects, which may inspire follow-up works. Code and all the models at https://github.com/lidc54/shift-wiseConv.

Dachong Li, Li Li, Zhuangzhuang Chen, Jianqiang Li• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)--
2989
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy83.9
2099
Object DetectionCOCO (val)--
633
Instance SegmentationCOCO (val)
APmk45.67
475
Monocular 3D Object DetectionnuScenes (val)
mAP31.42
11
Showing 5 of 5 rows

Other info

Code

Follow for update