Generative Transformer for Accurate and Reliable Salient Object Detection

About

Transformer, which originates from machine translation, is particularly powerful at modeling long-range dependencies. Currently, the transformer is making revolutionary progress in various vision tasks, leading to significant performance improvements compared with the convolutional neural network (CNN) based frameworks. In this paper, we conduct extensive research on exploiting the contributions of transformers for accurate and reliable salient object detection. For the former, we apply transformer to a deterministic model, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks. For the latter, we observe that both CNN and transformer based frameworks suffer greatly from the over-confidence issue, where the models tend to generate wrong predictions with high confidence. To estimate the reliability degree of both CNN- and transformer-based frameworks, we further present a latent variable model, namely inferential generative adversarial network (iGAN), based on the generative adversarial network (GAN). The stochastic attribute of the latent variable makes it convenient to estimate the predictive uncertainty, serving as an auxiliary output to evaluate the reliability of model prediction. Different from the conventional GAN, which defines the distribution of the latent variable as fixed standard normal distribution $\mathcal{N}(0,\mathbf{I})$, the proposed iGAN infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics, leading to an input-dependent latent variable model. We apply our proposed iGAN to both fully and weakly supervised salient object detection, and explain that iGAN within the transformer framework leads to both accurate and reliable salient object detection.

Yuxin Mao, Jing Zhang, Zhexiong Wan, Yuchao Dai, Aixuan Li, Yunqiu Lv, Xinyu Tian, Deng-Ping Fan, Nick Barnes• 2021

Related benchmarks

Task	Dataset	Result
Salient Object Detection	DUTS (test)	M (MAE)0.025	357
Salient Object Detection	ECSSD	MAE0.025	226
Salient Object Detection	PASCAL-S	MAE0.053	196
Salient Object Detection	HKU-IS	MAE0.023	179
Salient Object Detection	PASCAL-S (test)	MAE0.05	149
Salient Object Detection	HKU-IS (test)	MAE0.022	137
Salient Object Detection	ECSSD (test)	S-measure (Sa)0.943	104
Salient Object Detection	DUT-OMRON (test)	MAE0.048	92
Audio-Visual Segmentation	AVSBench S4 v1 (test)	MJ61.6	55
Salient Object Detection	DUTS	F-beta Score87.3	52

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord