Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SalFormer360: a transformer-based saliency estimation model for 360-degree videos

About

Saliency estimation has received growing attention in recent years due to its importance in a wide range of applications. In the context of 360-degree video, it has been particularly valuable for tasks such as viewport prediction and immersive content optimization. In this paper, we propose SalFormer360, a novel saliency estimation model for 360-degree videos built on a transformer-based architecture. Our approach is based on the combination of an existing encoder architecture, SegFormer, and a custom decoder. The SegFormer model was originally developed for 2D segmentation tasks, and it has been fine-tuned to adapt it to 360-degree content. To further enhance prediction accuracy in our model, we incorporated Viewing Center Bias to reflect user attention in 360-degree environments. Extensive experiments on the three largest benchmark datasets for saliency estimation demonstrate that SalFormer360 outperforms existing state-of-the-art methods. In terms of Pearson Correlation Coefficient, our model achieves 8.4% higher performance on Sport360, 2.5% on PVS-HM, and 18.6% on VR-EyeTracking compared to previous state-of-the-art.

Mahmoud Z. A. Wahba, Francesco Barbato, Sara Baldoni, Federica Battisti• 2026

Related benchmarks

TaskDatasetResultRank
Saliency PredictionSport360
CC0.722
15
Saliency PredictionPVS-HM
CC0.807
15
Saliency PredictionVR-EyeTracking
CC0.593
9
360-degree video saliency predictionGeneral
Params (M)3.7
7
Showing 4 of 4 rows

Other info

Follow for update