Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Building Vision Models upon Heat Conduction

About

Visual representation models leveraging attention mechanisms are challenged by significant computational overhead, particularly when pursuing large receptive fields. In this study, we aim to mitigate this challenge by introducing the Heat Conduction Operator (HCO) built upon the physical heat conduction principle. HCO conceptualizes image patches as heat sources and models their correlations through adaptive thermal energy diffusion, enabling robust visual representations. HCO enjoys a computational complexity of O(N^1.5), as it can be implemented using discrete cosine transformation (DCT) operations. HCO is plug-and-play, combining with deep learning backbones produces visual representation models (termed vHeat) with global receptive fields. Experiments across vision tasks demonstrate that, beyond the stronger performance, vHeat achieves up to a 3x throughput, 80% less GPU memory allocation, and 35% fewer computational FLOPs compared to the Swin-Transformer. Code is available at https://github.com/MzeroMiko/vHeat.

Zhaozhi Wang, Yue Liu, Yunjie Tian, Yunfan Liu, Yaowei Wang, Qixiang Ye• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU49.6
2888
Object DetectionCOCO 2017 (val)--
2643
Instance SegmentationCOCO 2017 (val)
APm0.437
1201
Semantic segmentationADE20K
mIoU49.6
1024
Image ClassificationImageNet A
Top-1 Acc36.8
654
Image ClassificationImageNet 1k (test)
Top-1 Accuracy84
450
Object DetectionCOCO 2017
AP (Box)48.8
321
Instance SegmentationCOCO 2017
APm43.7
226
Image ClassificationObjectNet
Top-1 Accuracy26.7
219
JPEG artifact reductionLIVE1
PSNR34.64
121
Showing 10 of 13 rows

Other info

Code

Follow for update