GazeD: Context-Aware Diffusion for Accurate 3D Gaze Estimation

About

We introduce GazeD, a new 3D gaze estimation method that jointly provides 3D gaze and human pose from a single RGB image. Leveraging the ability of diffusion models to deal with uncertainty, it generates multiple plausible 3D gaze and pose hypotheses based on the 2D context information extracted from the input image. Specifically, we condition the denoising process on the 2D pose, the surroundings of the subject, and the context of the scene. With GazeD we also introduce a novel way of representing the 3D gaze by positioning it as an additional body joint at a fixed distance from the eyes. The rationale is that the gaze is usually closely related to the pose, and thus it can benefit from being jointly denoised during the diffusion process. Evaluations across three benchmark datasets demonstrate that GazeD achieves state-of-the-art performance in 3D gaze estimation, even surpassing methods that rely on temporal information. Project details will be available at https://aimagelab.ing.unimore.it/go/gazed.

Riccardo Catalini, Davide Di Nucci, Guido Borghi, Davide Davoli, Lorenzo Garattoni, Gianpiero Francesca, Yuki Kawana, Roberto Vezzani• 2026

Related benchmarks

Task	Dataset	Result
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE33.8	122
3D Human Pose Estimation	Human3.6M (S9, S11)	Average Error (MPJPE Avg)41.1	94
3D Gaze Estimation	GFIE (test)	MAE 3D9.9	23
3D Gaze Estimation	GAFA Office	MAE2D11.6	9
3D Gaze Estimation	GAFA Living Room	2D MAE13.8	9
3D Gaze Estimation	GAFA Kitchen	MAE2D13.7	9
3D Gaze Estimation	GAFA Library	MAE2D12.9	9
3D Gaze Estimation	GAFA All	MAE2D16.3	9
3D Gaze Estimation	GAFA Courtyard	MAE2D27.9	9
3D Gaze Estimation	Ego-Gaze (val)	MAE3D (Basket)15.4	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord