Building Vision Models upon Heat Conduction

About

Visual representation models leveraging attention mechanisms are challenged by significant computational overhead, particularly when pursuing large receptive fields. In this study, we aim to mitigate this challenge by introducing the Heat Conduction Operator (HCO) built upon the physical heat conduction principle. HCO conceptualizes image patches as heat sources and models their correlations through adaptive thermal energy diffusion, enabling robust visual representations. HCO enjoys a computational complexity of O(N^1.5), as it can be implemented using discrete cosine transformation (DCT) operations. HCO is plug-and-play, combining with deep learning backbones produces visual representation models (termed vHeat) with global receptive fields. Experiments across vision tasks demonstrate that, beyond the stronger performance, vHeat achieves up to a 3x throughput, 80% less GPU memory allocation, and 35% fewer computational FLOPs compared to the Swin-Transformer. Code is available at https://github.com/MzeroMiko/vHeat.

Zhaozhi Wang, Yue Liu, Yunjie Tian, Yunfan Liu, Yaowei Wang, Qixiang Ye• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU49.6	3069
Object Detection	COCO 2017 (val)	--	2843
Instance Segmentation	COCO 2017 (val)	APm0.437	1275
Semantic segmentation	ADE20K	mIoU49.6	1028
Image Classification	ImageNet A	Top-1 Acc36.8	698
Image Classification	ImageNet 1k (test)	Top-1 Accuracy84	456
Object Detection	COCO 2017	AP (Box)48.8	345
Image Classification	ObjectNet	--	251
Instance Segmentation	COCO 2017	APm43.7	236
JPEG artifact reduction	LIVE1	PSNR34.64	142

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord