A Deep Multi-Level Network for Saliency Prediction

About

This paper presents a novel deep architecture for saliency prediction. Current state of the art models for saliency prediction employ Fully Convolutional networks that perform a non-linear combination of features extracted from the last convolutional layer to predict saliency maps. We propose an architecture which, instead, combines features extracted at different levels of a Convolutional Neural Network (CNN). Our model is composed of three main blocks: a feature extraction CNN, a feature encoding network, that weights low and high level feature maps, and a prior learning network. We compare our solution with state of the art saliency models on two public benchmarks datasets. Results show that our model outperforms under all evaluation metrics on the SALICON dataset, which is currently the largest public dataset for saliency prediction, and achieves competitive results on the MIT300 benchmark.

Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara• 2016

Related benchmarks

Task	Dataset	Result
Saliency Prediction	MIT300 (test)	CC0.67	56
Visual Saliency Prediction	CAT2000 (test)	Correlation Coefficient (CC)0.5221	19
Saliency Prediction	MIT1003 (test)	NSS2.3329	18
Distortion-aware saliency prediction	GenBlemish-27K	AUC-Judd0.8539	17
Driver Visual Attention Prediction	TrafficGaze (test)	KLD0.87	16
Driver Visual Attention Prediction	BDD-A (test)	KLD1.2	15
Driver Visual Attention Prediction	DADA 2000 (test)	KLD11.78	15
Affordance Grounding	AGD20k v1 (Seen)	KLD5.197	14
Affordance Grounding	AGD20k v1 (Unseen)	KLD5.012	14
Driver Visual Attention Prediction	DrFixD rainy (test)	KLD3.69	13

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord