Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition

About

Continuous sign language recognition (CSLR) focuses on interpreting and transcribing sequences of sign language gestures in videos. In this work, we propose CLIP sign language adaptation (CLIP-SLA), a novel CSLR framework that leverages the powerful pre-trained visual encoder from the CLIP model to sign language tasks through parameter-efficient fine-tuning (PEFT). We introduce two variants, SLA-Adapter and SLA-LoRA, which integrate PEFT modules into the CLIP visual encoder, enabling fine-tuning with minimal trainable parameters. The effectiveness of the proposed frameworks is validated on four datasets: Phoenix2014, Phoenix2014-T, CSL-Daily, and Isharah-500, where both CLIP-SLA variants outperformed several SOTA models with fewer trainable parameters. Extensive ablation studies emphasize the effectiveness and flexibility of the proposed methods with different vision-language models for CSLR. These findings showcase the potential of adapting large-scale pre-trained models for scalable and efficient CSLR, which pave the way for future advancements in sign language understanding.

Sarah Alyami, Hamzah Luqman• 2025

Related benchmarks

TaskDatasetResultRank
Continuous Sign Language RecognitionCSL-Daily (dev)
Word Error Rate (WER)26
98
Continuous Sign Language RecognitionCSL-Daily (test)
WER25.8
91
Continuous Sign Language RecognitionPHOENIX14-T (dev)
WER19.8
75
Continuous Sign Language RecognitionPHOENIX-2014T (test)
WER19.4
43
Continuous Sign Language RecognitionPhoenix14 (test)
WER19.3
39
Continuous Sign Language RecognitionPhoenix14 (dev)
WER19.7
29
Continuous Sign Language RecognitionPHOENIX 14 (dev test)--
16
Showing 7 of 7 rows

Other info

Code

Follow for update