Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ViSAGE @ NTIRE 2026 Challenge on Video Saliency Prediction

About

In this report, we present our champion solution for the NTIRE 2026 Challenge on Video Saliency Prediction held in conjunction with CVPR 2026. To exploit complementary inductive biases for video saliency, we propose Video Saliency with Adaptive Gated Experts (ViSAGE), a multi-expert ensemble framework. Each specialized decoder performs adaptive gating and modulation to refine spatio-temporal features. The complementary predictions from different experts are then fused at inference. ViSAGE thereby aggregates diverse inductive biases to capture complex spatio-temporal saliency cues in videos. On the Private Test set, ViSAGE ranked first on two out of four evaluation metrics, and outperformed most competing solutions on the other two metrics, demonstrating its effectiveness and generalization ability. Our code has been released at https://github.com/iLearn-Lab/CVPRW26-ViSAGE.

Kun Wang, Yupeng Hu, Zhiran Li, Hao Liu, Qianlong Xiang, Liqiang Nie• 2026

Related benchmarks

TaskDatasetResultRank
Video saliency predictionNTIRE Video Saliency Prediction 2026 (private test)
CC0.828
8
Video saliency predictionDIEM
CC0.679
8
Showing 2 of 2 rows

Other info

Follow for update