Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction

About

Gaze following and social gaze prediction are fundamental tasks providing insights into human communication behaviors, intent, and social interactions. Most previous approaches addressed these tasks separately, either by designing highly specialized social gaze models that do not generalize to other social gaze tasks or by considering social gaze inference as an ad-hoc post-processing of the gaze following task. Furthermore, the vast majority of gaze following approaches have proposed static models that can handle only one person at a time, therefore failing to take advantage of social interactions and temporal dynamics. In this paper, we address these limitations and introduce a novel framework to jointly predict the gaze target and social gaze label for all people in the scene. The framework comprises of: (i) a temporal, transformer-based architecture that, in addition to image tokens, handles person-specific tokens capturing the gaze information related to each individual; (ii) a new dataset, VSGaze, that unifies annotation types across multiple gaze following and social gaze datasets. We show that our model trained on VSGaze can address all tasks jointly, and achieves state-of-the-art results for multi-person gaze following and social gaze prediction.

Anshul Gupta, Samy Tafasca, Arya Farkhondeh, Pierre Vuillecard, Jean-Marc Odobez• 2024

Related benchmarks

TaskDatasetResultRank
Gaze FollowingGazeFollow (test)
AUC0.929
24
Gaze FollowingVAT (test)
Distance Error0.11
11
Gaze following in videoVAT (test)
Distance Error0.11
11
Gaze FollowingChildPlay
Distance0.113
10
Social Gaze PredictionVAT
F1 (LAH)82
7
Social Gaze PredictionChildPlay
F1 (LAH)68.2
7
Social Gaze PredictionVideoCoAtt
F1 (LAH)82
7
Social Gaze PredictionUCO-LAEO
F1 Score (LAH)99.5
7
Social Gaze PredictionVSGaze
F1 (LAH)80.7
7
Gaze FollowingVAT
Distance Error0.116
6
Showing 10 of 15 rows

Other info

Follow for update