Multi-task head pose estimation in-the-wild

About

We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task. Further, the location of the pose task at the bottleneck layer, at the end of the encoder, and that of tasks depending on spatial information, such as visibility and alignment, in the final decoder layer, also contribute to increase the final performance. In the experiments conducted the proposed model outperforms the state-of-the-art in the face pose and visibility tasks. By including a final landmark regression step it also produces face alignment results on par with the state-of-the-art.

Roberto Valle, Jos\'e Miguel Buenaposada, Luis Baumela• 2022

Related benchmarks

Task	Dataset	Result
Face Alignment	COFW (test)	NME5.04	72
Head Pose Estimation	BIWI (test)	Yaw Error3.98	62
Head Pose Estimation	AFLW 3D 2000 (test)	MAE (Yaw)3.34	50
Head Pose Estimation	BIWI	MAE3.66	46
6DoF head pose estimation	BIWI (test)	Yaw Error3.98	31
Face Alignment	AFLW2000-3D (test)	NME (Full height)2.58	29
Head Pose Estimation	BIWI cross-domain	Yaw Error3.98	26
Head Pose Estimation	AFLW	Yaw MAE4.16	21
Head Pose Estimation	AFLW2000-3D	Yaw MAE3.34	20
Head Pose Estimation	AFLW2000	Euler Yaw Error3.34	16

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord