Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cross-Modality Fusion Transformer for Multispectral Object Detection

About

Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust in the open world. To fully exploit the different modalities, we present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper. Unlike prior CNNs-based works, guided by the transformer scheme, our network learns long-range dependencies and integrates global contextual information in the feature extraction stage. More importantly, by leveraging the self attention of the transformer, the network can naturally carry out simultaneous intra-modality and inter-modality fusion, and robustly capture the latent interactions between RGB and Thermal domains, thereby significantly improving the performance of multispectral object detection. Extensive experiments and ablation studies on multiple datasets demonstrate that our approach is effective and achieves state-of-the-art detection performance. Our code and models are available at https://github.com/DocF/multispectral-object-detection.

Fang Qingyun, Han Dapeng, Wang Zhaokui• 2021

Related benchmarks

TaskDatasetResultRank
Object DetectionFLIR (test)
mAP500.787
83
Object DetectionOGSOD 2.0 (test)
mAP5079
77
Object DetectionSpaceNet6 OTD-Fog (test)
mAP5080.2
77
Object DetectionDroneVehicle (test)
mAP5061.3
61
Object DetectionLLVIP
mAP5097.5
58
Object DetectionFLIR
mAP40.2
40
Object DetectionDroneVehicle
mAP61.3
35
Object DetectionFLIR Aligned (test)
mAP@0.578.7
26
Object DetectionVeDAI
mAP35.4
14
Object DetectionFLIR relabeled version by Zhang (test)
mAP40.2
11
Showing 10 of 11 rows

Other info

Follow for update