SUM: Saliency Unification through Mamba for Visual Attention Modeling

About

Visual attention modeling, important for interpreting and prioritizing visual stimuli, plays a significant role in applications such as marketing, multimedia, and robotics. Traditional saliency prediction models, especially those based on Convolutional Neural Networks (CNNs) or Transformers, achieve notable success by leveraging large-scale annotated datasets. However, the current state-of-the-art (SOTA) models that use Transformers are computationally expensive. Additionally, separate models are often required for each image type, lacking a unified approach. In this paper, we propose Saliency Unification through Mamba (SUM), a novel approach that integrates the efficient long-range dependency modeling of Mamba with U-Net to provide a unified model for diverse image types. Using a novel Conditional Visual State Space (C-VSS) block, SUM dynamically adapts to various image types, including natural scenes, web pages, and commercial imagery, ensuring universal applicability across different data types. Our comprehensive evaluations across five benchmarks demonstrate that SUM seamlessly adapts to different visual characteristics and consistently outperforms existing models. These results position SUM as a versatile and powerful tool for advancing visual attention modeling, offering a robust solution universally applicable across different types of visual content.

Alireza Hosseini, Amirhossein Kazerouni, Saeed Akhavan, Michael Brudno, Babak Taati• 2024

Related benchmarks

Task	Dataset	Result
Saliency Prediction	SALICON (test)	NSS1.981	25
Saliency Prediction	SalECI E-Commercial	CC0.789	21
Saliency Prediction	U-EYE Web page	CC0.731	17
Visual Attention Prediction	ObjectVisA 120 (test)	CC0.4722	16
Saliency Prediction	MIT1003 Natural scene	CC0.768	13
Image Attention Modeling	CAT2000 Natural	CC0.882	13
Image Attention Modeling	OSIE Natural	CC0.861	12
Image Attention Modeling	SALICON Natural	CC90.9	12
Saliency Prediction	CAT2000 Natural scene	CC0.882	8
Saliency Prediction	OSIE Natural scene	CC0.861	7

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord