Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Generative Transformer for Accurate and Reliable Salient Object Detection

About

Transformer, which originates from machine translation, is particularly powerful at modeling long-range dependencies. Currently, the transformer is making revolutionary progress in various vision tasks, leading to significant performance improvements compared with the convolutional neural network (CNN) based frameworks. In this paper, we conduct extensive research on exploiting the contributions of transformers for accurate and reliable salient object detection. For the former, we apply transformer to a deterministic model, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks. For the latter, we observe that both CNN and transformer based frameworks suffer greatly from the over-confidence issue, where the models tend to generate wrong predictions with high confidence. To estimate the reliability degree of both CNN- and transformer-based frameworks, we further present a latent variable model, namely inferential generative adversarial network (iGAN), based on the generative adversarial network (GAN). The stochastic attribute of the latent variable makes it convenient to estimate the predictive uncertainty, serving as an auxiliary output to evaluate the reliability of model prediction. Different from the conventional GAN, which defines the distribution of the latent variable as fixed standard normal distribution $\mathcal{N}(0,\mathbf{I})$, the proposed iGAN infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics, leading to an input-dependent latent variable model. We apply our proposed iGAN to both fully and weakly supervised salient object detection, and explain that iGAN within the transformer framework leads to both accurate and reliable salient object detection.

Yuxin Mao, Jing Zhang, Zhexiong Wan, Yuchao Dai, Aixuan Li, Yunqiu Lv, Xinyu Tian, Deng-Ping Fan, Nick Barnes• 2021

Related benchmarks

TaskDatasetResultRank
Salient Object DetectionDUTS (test)
M (MAE)0.025
302
Salient Object DetectionPASCAL-S (test)
MAE0.05
149
Salient Object DetectionHKU-IS (test)
MAE0.022
137
Salient Object DetectionECSSD (test)
S-measure (Sa)0.943
104
Salient Object DetectionDUT-OMRON (test)
MAE0.048
92
Audio-Visual SegmentationAVSBench S4 v1 (test)
MJ61.6
55
Audio-Visual SegmentationAVSBench MS3 (test)
Jaccard Index (IoU)42.9
30
Sound Target SegmentationAVSBench-object MS3 1.0 (test)
mIoU42.9
23
Audio-Visual SegmentationAVS-Object-Single
J&F Score69.7
13
Audio-Visual SegmentationAVS-Object-Multi
J&F Score48.7
13
Showing 10 of 11 rows

Other info

Follow for update