TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
About
Salient object detection is the pixel-level dense prediction task which can highlight the prominent object in the scene. Recently U-Net framework is widely used, and continuous convolution and pooling operations generate multi-level features which are complementary with each other. In view of the more contribution of high-level features for the performance, we propose a triplet transformer embedding module to enhance them by learning long-range dependencies across layers. It is the first to use three transformer encoders with shared weights to enhance multi-level features. By further designing scale adjustment module to process the input, devising three-stream decoder to process the output and attaching depth features to color features for the multi-modal fusion, the proposed triplet transformer embedding network (TriTransNet) achieves the state-of-the-art performance in RGB-D salient object detection, and pushes the performance to a new level. Experimental results demonstrate the effectiveness of the proposed modules and the competition of TriTransNet.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Salient Object Detection | USOD10k | S-alpha0.7889 | 40 | |
| Underwater Salient Object Detection | USOD10k 1.0 (test) | S_alpha0.7889 | 21 |