Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training

About

Facial representation pre-training is crucial for tasks like facial recognition, expression analysis, and virtual reality. However, existing methods face three key challenges: (1) failing to capture distinct facial features and fine-grained semantics, (2) ignoring the spatial structure inherent to facial anatomy, and (3) inefficiently utilizing limited labeled data. To overcome these, we introduce PaCo-FR, an unsupervised framework that combines masked image modeling with patch-pixel alignment. Our approach integrates three innovative components: (1) a structured masking strategy that preserves spatial coherence by aligning with semantically meaningful facial regions, (2) a novel patch-based codebook that enhances feature discrimination with multiple candidate tokens, and (3) spatial consistency constraints that preserve geometric relationships between facial components. PaCo-FR achieves state-of-the-art performance across several facial analysis tasks with just 2 million unlabeled images for pre-training. Our method demonstrates significant improvements, particularly in scenarios with varying poses, occlusions, and lighting conditions. We believe this work advances facial representation learning and offers a scalable, efficient solution that reduces reliance on expensive annotated datasets, driving more effective facial analysis systems.

Yin Xie, Zhichao Chen, Zeyu Xiao, Yongle Zhao, Xiang An, Kaicheng Yang, Zimin Ran, Jia Guo, Ziyong Feng, Jiankang Deng• 2025

Related benchmarks

TaskDatasetResultRank
Face AlignmentWFLW (test)
NME (%) (Testset)3.99
144
Face Alignment300W Fullset (test)--
82
Face ParsingLaPa (test)
Skin Accuracy97.63
39
Face AlignmentAFLW Frontal 19 landmarks (test)
NMEdiag0.82
26
Face Alignment300W Common Subset (test)--
25
Monocular 3D Face ReconstructionNoW (val)--
20
Face Alignment300W Challenge (test)
NME (Inter-ocular)4.5
15
Face AlignmentAFLW Full 19-point (test)
NMEdiag0.955
10
Showing 8 of 8 rows

Other info

Follow for update