LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables
About
Current advanced research on infrared and visible image fusion primarily focuses on improving fusion performance, often neglecting the applicability on real-time fusion devices. In this paper, we propose a novel approach that towards extremely fast fusion via distillation to learnable lookup tables specifically designed for image fusion, termed as LUT-Fuse. Firstly, we develop a look-up table structure that utilizing low-order approximation encoding and high-level joint contextual scene encoding, which is well-suited for multi-modal fusion. Moreover, given the lack of ground truth in multi-modal image fusion, we naturally proposed the efficient LUT distillation strategy instead of traditional quantization LUT methods. By integrating the performance of the multi-modal fusion network (MM-Net) into the MM-LUT model, our method achieves significant breakthroughs in efficiency and performance. It typically requires less than one-tenth of the time compared to the current lightweight SOTA fusion algorithms, ensuring high operational speed across various scenarios, even in low-power mobile devices. Extensive experiments validate the superiority, reliability, and stability of our fusion approach. The code is available at https://github.com/zyb5/LUT-Fuse.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | LLVIP | mAP5094.1 | 109 | |
| Semantic segmentation | MSRS | mIoU73.6 | 93 | |
| Object Detection | M3FD | AP@[0.5:0.95]41.65 | 45 | |
| Infrared-Visible Image Fusion | MSRS | QAB/F (Quality Assessment Block/Fusion)0.579 | 38 | |
| Image Fusion | Harvard Medicine Dataset (test) | Average Gradient (AG)6.549 | 20 | |
| Image Fusion | LLVIP (test) | VIF0.464 | 11 | |
| Image Fusion | LLVIP | Entropy (EN)6.894 | 11 | |
| Image Fusion | RoadScene | EN Score6.959 | 11 | |
| Image Fusion | Image Fusion efficiency evaluation 256x256 | Model Size (MB)0.0078 | 10 |