Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FGTBT: Frequency-Guided Task-Balancing Transformer for Unified Facial Landmark Detection

About

Recently, deep learning based facial landmark detection (FLD) methods have achieved considerable success. However, in challenging scenarios such as large pose variations, illumination changes, and facial expression variations, they still struggle to accurately capture the geometric structure of the face, resulting in performance degradation. Moreover, the limited size and diversity of existing FLD datasets hinder robust model training, leading to reduced detection accuracy. To address these challenges, we propose a Frequency-Guided Task-Balancing Transformer (FGTBT), which enhances facial structure perception through frequency-domain modeling and multi-dataset unified training. Specifically, we propose a novel Fine-Grained Multi-Task Balancing loss (FMB-loss), which moves beyond coarse task-level balancing by assigning weights to individual landmarks based on their occurrence across datasets. This enables more effective unified training and mitigates the issue of inconsistent gradient magnitudes. Additionally, a Frequency-Guided Structure-Aware (FGSA) model is designed to utilize frequency-guided structure injection and regularization to help learn facial structure constraints. Extensive experimental results on popular benchmark datasets demonstrate that the integration of the proposed FMB-loss and FGSA model into our FGTBT framework achieves performance comparable to state-of-the-art methods. The code is available at https://github.com/Xi0ngxinyu/FGTBT.

Jun Wan, Xinyu Xiong, Ning Chen, Zhihui Lai, Jie Zhou, Wenwen Min• 2026

Related benchmarks

TaskDatasetResultRank
Facial Landmark Detection300-W (Common)--
180
Facial Landmark Detection300-W (Fullset)
Mean Error (%)3.06
174
Facial Landmark Detection300W (Challenging)--
159
Facial Landmark DetectionWFLW (test)
Mean Error (ME) - All4.42
122
Facial Landmark DetectionCOFW (test)--
93
Facial Landmark Detection300W--
52
Facial Landmark DetectionAFLW Full (test)
Average Error1.67
26
Showing 7 of 7 rows

Other info

Follow for update