SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
About
We introduce SalGAN, a deep convolutional neural network for visual saliency prediction trained with adversarial examples. The first stage of the network consists of a generator model whose weights are learned by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency maps. The resulting prediction is processed by a discriminator network trained to solve a binary classification task between the saliency maps generated by the generative stage and the ground truth ones. Our experiments show how adversarial training allows reaching state-of-the-art performance across different metrics when combined with a widely-used loss function like BCE. Our results can be reproduced with the source code and trained models available at https://imatge-upc.github.io/saliency-salgan-2017/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video saliency prediction | DHF1K (test) | AUC-J0.866 | 89 | |
| Video saliency prediction | Hollywood-2 (test) | SIM0.393 | 83 | |
| Video saliency prediction | UCF Sports (test) | SIM0.332 | 71 | |
| Saliency Prediction | MIT300 (test) | CC0.73 | 56 | |
| Visual Saliency Prediction | SALICON (test) | CC0.781 | 12 | |
| Saliency Prediction | DHF1K | Model Size (MB)130 | 12 | |
| Affordance Grounding | OPRA 28 x 28 (test) | KLD2.12 | 11 | |
| Affordance Grounding | EPIC-Hotspots 28 x 28 (test) | KLD1.51 | 10 | |
| Grounded affordance prediction | OPRA (seen classes) | KLD2.116 | 9 | |
| Affordance Grounding | OPRA (test) | KLD2.116 | 9 |