Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unified Image and Video Saliency Modeling

About

Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. While image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing, video saliency models have shown rapid gains on the recent DHF1K benchmark. Here, we take a step back and ask: Can image and video saliency modeling be approached via a unified model, with mutual benefit? We identify different sources of domain shift between image and video saliency data and between different video saliency datasets as a key challenge for effective joint modelling. To address this we propose four novel domain adaptation techniques - Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN - in addition to an improved formulation of learned Gaussian priors. We integrate these techniques into a simple and lightweight encoder-RNN-decoder-style network, UNISAL, and train it jointly with image and video saliency data. We evaluate our method on the video saliency datasets DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and MIT300. With one set of parameters, UNISAL achieves state-of-the-art performance on all video saliency datasets and is on par with the state-of-the-art for image saliency datasets, despite faster runtime and a 5 to 20-fold smaller model size compared to all competing deep methods. We provide retrospective analyses and ablation studies which confirm the importance of the domain shift modeling. The code is available at https://github.com/rdroste/unisal

Richard Droste, Jianbo Jiao, J. Alison Noble• 2020

Related benchmarks

TaskDatasetResultRank
Video saliency predictionDHF1K (test)
AUC-J0.901
89
Video saliency predictionHollywood-2 (test)
SIM0.544
83
Video saliency predictionUCF Sports (test)
SIM0.523
71
Saliency PredictionMIT300 (test)
CC0.784
56
No-Reference Video Quality AssessmentLIVE-VQC
SRCC0.872
50
No-Reference Video Quality AssessmentYouTube-UGC
SRCC0.875
47
No-Reference Video Quality AssessmentKoNViD-1k
SRCC0.859
42
Saliency PredictionSALICON (test)
NSS1.952
25
Video Quality AssessmentLIVE-VQC, KoNViD-1k, YouTube-UGC (Weighted Average)
SROCC0.87
23
Visual Saliency PredictionCAT2000 (test)
Correlation Coefficient (CC)0.8417
19
Showing 10 of 22 rows

Other info

Code

Follow for update