Facial Action Unit Detection and Intensity Estimation from Self-supervised Representation

About

As a fine-grained and local expression behavior measurement, facial action unit (FAU) analysis (e.g., detection and intensity estimation) has been documented for its time-consuming, labor-intensive, and error-prone annotation. Thus a long-standing challenge of FAU analysis arises from the data scarcity of manual annotations, limiting the generalization ability of trained models to a large extent. Amounts of previous works have made efforts to alleviate this issue via semi/weakly supervised methods and extra auxiliary information. However, these methods still require domain knowledge and have not yet avoided the high dependency on data annotation. This paper introduces a robust facial representation model MAE-Face for AU analysis. Using masked autoencoding as the self-supervised pre-training approach, MAE-Face first learns a high-capacity model from a feasible collection of face images without additional data annotations. Then after being fine-tuned on AU datasets, MAE-Face exhibits convincing performance for both AU detection and AU intensity estimation, achieving a new state-of-the-art on nearly all the evaluation results. Further investigation shows that MAE-Face achieves decent performance even when fine-tuned on only 1\% of the AU training set, strongly proving its robustness and generalization performance.

Bowen Ma, Rudong An, Wei Zhang, Yu Ding, Zeng Zhao, Rongsheng Zhang, Tangjie Lv, Changjie Fan, Zhipeng Hu• 2022

Related benchmarks

Task	Dataset	Result
Action Unit Detection	DISFA	--	21
Arousal Prediction	SEWA DB	Accuracy70	16
Valence Prediction	SEWA DB	Accuracy81	16
Action Unit Detection	BP4D (5-fold cross-val)	Average Performance67.4	14
Arousal Prediction	Aff-Wild2	Accuracy (3s, 10%)65	10
Valence Prediction	Aff-Wild2	Accuracy (3s, 10%)58	10
Action Unit Intensity Estimation	DISFA (test)	Avg ICC0.674	7

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord