MARLIN: Masked Autoencoder for facial video Representation LearnINg

About

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust and generic facial embeddings from abundantly available non-annotated web crawled facial videos. As a challenging auxiliary task, MARLIN reconstructs the spatio-temporal details of the face from the densely masked facial regions which mainly include eyes, nose, mouth, lips, and skin to capture local and global aspects that in turn help in encoding generic and transferable features. Through a variety of experiments on diverse downstream tasks, we demonstrate MARLIN to be an excellent facial video encoder as well as feature extractor, that performs consistently well across a variety of downstream tasks including FAR (1.13% gain over supervised benchmark), FER (2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervised benchmark), LS (29.36% gain for Frechet Inception Distance), and even in low data regime. Our code and models are available at https://github.com/ControlNet/MARLIN .

Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat• 2022

Related benchmarks

Task	Dataset	Result
Sentiment Analysis	CMU-MOSEI (test)	--	96
Deepfake Detection	FaceForensics++ c23 (train)	FF c23 Score93.7	31
Deepfake Detection	Cross-Domain Evaluation (test)	CDFv1 Score71.4	31
Deepfake Detection	FaceForensics++ (FF) (test)	Average AUC (FF)0.981	22
Emotion Recognition	CMU-MOSEI	--	19
Deepfake Detection	FaceForensics++ LQ	AUC0.9305	17
Lip-syncing	LRS2 1 (test)	LSE-D7.127	12
Deepfake Detection	AV-Deepfake1M official (test)	AUC0.5803	11
Deepfake Detection	Celeb-DF (CDF) (test)	Avg CDF AUC0.796	9
Facial Attribute Recognition	CelebV-HQ	Appearance Accuracy93.9	6

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord