Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency

About

This paper addresses the challenging problem of estimating the general visual attention of people in images. Our proposed method is designed to work across multiple naturalistic social scenarios and provides a full picture of the subject's attention and gaze. In contrast, earlier works on gaze and attention estimation have focused on constrained problems in more specific contexts. In particular, our model explicitly represents the gaze direction and handles out-of-frame gaze targets. We leverage three different datasets using a multi-task learning approach. We evaluate our method on widely used benchmarks for single-tasks such as gaze angle estimation and attention-within-an-image, as well as on the new challenging task of generalized visual attention prediction. In addition, we have created extended annotations for the MMDB and GazeFollow datasets which are used in our experiments, which we will publicly release.

Eunji Chong, Nataniel Ruiz, Yongxin Wang, Yun Zhang, Agata Rozga, James Rehg• 2018

Related benchmarks

Task	Dataset	Result
Gaze target estimation	GazeFollow	Avg L2 Distance0.137	67
Gaze target estimation	VideoAttentionTarget	L2 Distance0.134	56
Gaze Following	VideoAttentionTarget	L2 Distance0.171	38
Gaze Following	GazeFollowing	Minimum Distance0.112	35
Gaze Following	GazeFollow (test)	AUC0.896	24
Gaze Following	VideoAttentionTarget (test)	AUC0.83	20
Gaze target estimation	GazeFollow360	Spherical Distance0.9183	10

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord