Now You See Me, Now You Don't: A Unified Framework for Expression Consistent Anonymization in Talking Head Videos
About
Face video anonymization is aimed at privacy preservation while allowing for the analysis of videos in a number of computer vision downstream tasks such as expression recognition, people tracking, and action recognition. We propose here a novel unified framework referred to as Anon-NET, streamlined to de-identify facial videos, while preserving age, gender, race, pose, and expression of the original video. Specifically, we inpaint faces by a diffusion-based generative model guided by high-level attribute recognition and motion-aware expression transfer. We then animate deidentified faces by video-driven animation, which accepts the de-identified face and the original video as input. Extensive experiments on the datasets VoxCeleb2, CelebV-HQ, and HDTF, which include diverse facial dynamics, demonstrate the effectiveness of AnonNET in obfuscating identity while retaining visual realism and temporal consistency. The code of AnonNet will be publicly released.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Face Re-identification | CelebA-HQ | VGG Error0.041 | 7 | |
| Face Re-identification | LFW | VGG Error0.042 | 7 | |
| Perceptual Quality and Aesthetic Appeal | CelebA-HQ | Perceptual Quality Score4.164 | 5 | |
| Perceptual Quality and Aesthetic Appeal | LFW | Quality Score2.914 | 5 | |
| Gaze Preservation | CelebA-HQ | Gaze Score18.7 | 5 | |
| Pose Preservation | CelebA-HQ | Pose Score0.015 | 5 | |
| Face Anonymization | VoxCeleb 2 | Identity Preservation0.022 | 2 | |
| Face Anonymization | CelebV-HQ | ID Preservation0.013 | 2 | |
| Face Anonymization | HDTF | ID Preservation0.007 | 2 |