Task-Customized Mixture of Adapters for General Image Fusion
About
General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks. To handle this problem, we propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusion tasks in a unified model. We borrow the insight from the mixture of experts (MoE), taking the experts as efficient tuning adapters to prompt a pre-trained foundation model. These adapters are shared across different tasks and constrained by mutual information regularization, ensuring compatibility with different tasks while complementarity for multi-source images. The task-specific routing networks customize these adapters to extract task-specific information from different sources with dynamic dominant intensity, performing adaptive visual feature prompt fusion. Notably, our TC-MoA controls the dominant intensity bias for different fusion tasks, successfully unifying multiple fusion tasks in a single model. Extensive experiments show that TC-MoA outperforms the competing approaches in learning commonalities while retaining compatibility for general image fusion (multi-modal, multi-exposure, and multi-focus), and also demonstrating striking controllability on more generalization experiments. The code is available at https://github.com/YangSun22/TC-MoA .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | MFNet (test) | mIoU58.87 | 134 | |
| Semantic segmentation | FMB (test) | mIoU57.72 | 59 | |
| Visible-Infrared Image Fusion | MSRS (test) | Average Gradient (AG)3.251 | 43 | |
| Infrared-Visible Image Fusion | RoadScene (test) | Average Gradient (AG)5.339 | 40 | |
| Object Detection | LLVIP (test) | mAP5096.1 | 38 | |
| Multi-Exposure Image Fusion | MEFB | Standard Deviation (SD)50.27 | 30 | |
| Infrared and Visible Image Fusion | RoadScene | MI2.853 | 28 | |
| Infrared-Visible Image Fusion | LLVIP (test) | EN7.4 | 23 | |
| Infrared-Visible Image Fusion | MSRS | Entropy (EN)6.633 | 23 | |
| Medical image fusion | MRI-PET (test) | Entropy (EN)4.83 | 16 |