Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

About

Transformers have offered a new methodology of designing neural networks for visual recognition. Compared to convolutional networks, Transformers enjoy the ability of referring to global features at each stage, yet the attention module brings higher computational overhead that obstructs the application of Transformers to process high-resolution visual data. This paper aims to alleviate the conflict between efficiency and flexibility, for which we propose a specialized token for each region that serves as a messenger (MSG). Hence, by manipulating these MSG tokens, one can flexibly exchange visual information across regions and the computational complexity is reduced. We then integrate the MSG token into a multi-scale architecture named MSG-Transformer. In standard image classification and object detection, MSG-Transformer achieves competitive performance and the inference on both GPU and CPU is accelerated. Code is available at https://github.com/hustvl/MSG-Transformer.

Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian• 2021

Related benchmarks

TaskDatasetResultRank
Instance SegmentationCOCO 2017 (val)--
1144
Object DetectionCOCO 2017
AP (Box)50.3
279
Object DetectionMS-COCO 2017 (val)--
237
Image ClassificationImageNet-1k (val)
Top-1 Accuracy84
50
Showing 4 of 4 rows

Other info

Code

Follow for update