Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

About

Heatmap regression methods have dominated face alignment area in recent years while they ignore the inherent relation between different landmarks. In this paper, we propose a Sparse Local Patch Transformer (SLPT) for learning the inherent relation. The SLPT generates the representation of each single landmark from a local patch and aggregates them by an adaptive inherent relation based on the attention mechanism. The subpixel coordinate of each landmark is predicted independently based on the aggregated feature. Moreover, a coarse-to-fine framework is further introduced to incorporate with the SLPT, which enables the initial landmarks to gradually converge to the target facial landmarks using fine-grained features from dynamically resized local patches. Extensive experiments carried out on three popular benchmarks, including WFLW, 300W and COFW, demonstrate that the proposed method works at the state-of-the-art level with much less computational complexity by learning the inherent relation between facial landmarks. The code is available at the project website.

Jiahao Xia, Weiwei qu, Wenjian Huang, Jianguo Zhang, Xi Wang, Min Xu• 2022

Related benchmarks

Task	Dataset	Result
Facial Landmark Detection	300-W (Common)	--	180
Facial Landmark Detection	300-W (Fullset)	Mean Error (%)3.17	174
Facial Landmark Detection	300W (Challenging)	--	159
Face Alignment	WFLW (test)	NME (%) (Testset)4.12	144
Facial Landmark Detection	WFLW (test)	Mean Error (ME) - All4.12	122
Face Alignment	300W (Challenging)	NME4.93	93
Facial Landmark Detection	COFW (test)	NME4.79	93
Face Alignment	300W Common	NME2.78	90
Face Alignment	300W Fullset (test)	--	82
Face Alignment	COFW (test)	--	72

Showing 10 of 36 rows

Other info

Code

Follow for update

@wizwand_team Discord