Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FaceLiVT: Face Recognition using Linear Vision Transformer with Structural Reparameterization For Mobile Device

About

This paper introduces FaceLiVT, a lightweight yet powerful face recognition model that integrates a hybrid Convolution Neural Network (CNN)-Transformer architecture with an innovative and lightweight Multi-Head Linear Attention (MHLA) mechanism. By combining MHLA alongside a reparameterized token mixer, FaceLiVT effectively reduces computational complexity while preserving competitive accuracy. Extensive evaluations on challenging benchmarks; including LFW, CFP-FP, AgeDB-30, IJB-B, and IJB-C; highlight its superior performance compared to state-of-the-art lightweight models. MHLA notably improves inference speed, allowing FaceLiVT to deliver high accuracy with lower latency on mobile devices. Specifically, FaceLiVT is 8.6 faster than EdgeFace, a recent hybrid CNN-Transformer model optimized for edge devices, and 21.2 faster than a pure ViT-Based model. With its balanced design, FaceLiVT offers an efficient and practical solution for real-time face recognition on resource-constrained platforms.

Novendra Setyawan, Chi-Chia Sun, Mao-Hsiu Hsu, Wen-Kai Kuo, Jun-Wei Hsieh• 2025

Related benchmarks

TaskDatasetResultRank
Face RecognitionLFW
Accuracy99.7
206
Face VerificationCA-LFW
Accuracy95.76
98
Face RecognitionCFP-FP
Accuracy97.2
98
Face RecognitionIJB-C
TAR (FAR=1e-4)95.7
51
Face RecognitionIJB-B
TAR @ FAR=1e-493.7
51
Face RecognitionAgeDB-30
Accuracy97.6
49
Face RecognitionCP-LFW
Accuracy90.97
26
Face RecognitionLFW, CA-LFW, CP-LFW, CFP-FP, and AgeDB-30 (test)
Mean Accuracy (%)96.25
16
Face RecognitionIJB-B
TPR (FPR=1e-4)93.7
11
Face RecognitionIJB-C
TPR @ FAR=1e-495.7
11
Showing 10 of 10 rows

Other info

Follow for update