Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Deep Visual Attention Prediction

About

In this work, we aim to predict human eye fixation with view-free scenes based on an end-to-end deep learning architecture. Although Convolutional Neural Networks (CNNs) have made substantial improvement on human attention prediction, it is still needed to improve CNN based attention models by efficiently leveraging multi-scale features. Our visual attention network is proposed to capture hierarchical saliency information from deep, coarse layers with global saliency information to shallow, fine layers with local saliency response. Our model is based on a skip-layer network structure, which predicts human attention from multiple convolutional layers with various reception fields. Final saliency prediction is achieved via the cooperation of those global and local predictions. Our model is learned in a deep supervision manner, where supervision is directly fed into multi-level layers, instead of previous approaches of providing supervision only at the output layer and propagating this supervision back to earlier layers. Our model thus incorporates multi-level saliency predictions within a single network, which significantly decreases the redundancy of previous approaches of learning multiple network streams with different input scales. Extensive experimental analysis on various challenging benchmark datasets demonstrate our method yields state-of-the-art performance with competitive inference time.

Wenguan Wang, Jianbing Shen• 2017

Related benchmarks

TaskDatasetResultRank
Video saliency predictionDHF1K (test)
AUC-J0.883
89
Video saliency predictionHollywood-2 (test)
SIM0.497
83
Video saliency predictionUCF Sports (test)
SIM0.387
71
Saliency PredictionMIT300 (test)
CC0.68
56
Saliency PredictionDIEM (test)
SIM0.237
28
Visual Saliency PredictionCAT2000 (test)
Correlation Coefficient (CC)0.8616
19
Saliency PredictionMIT1003 (test)
NSS2.574
18
Saliency PredictionDHF1K
Model Size (MB)96
12
Video saliency predictionUCF Sports
NSS2.311
11
Saliency PredictionCAT2000 Natural scene
CC0.861
8
Showing 10 of 16 rows

Other info

Follow for update