Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment

About

Our aim is to develop a unified model for sign language understanding, that performs sign language translation (SLT) and sign-subtitle alignment (SSA). Together, these two tasks enable the conversion of continuous signing videos into spoken language text and also the temporal alignment of signing with subtitles -- both essential for practical communication, large-scale corpus construction, and educational applications. To achieve this, our approach is built upon three components: (i) a lightweight visual backbone that captures manual and non-manual cues from human keypoints and lip-region images while preserving signer privacy; (ii) a Sliding Perceiver mapping network that aggregates consecutive visual features into word-level embeddings to bridge the vision-text gap; and (iii) a multi-task scalable training strategy that jointly optimises SLT and SSA, reinforcing both linguistic and temporal alignment. To promote cross-linguistic generalisation, we pretrain our model on large-scale sign-text corpora covering British Sign Language (BSL) and American Sign Language (ASL) from the BOBSL and YouTube-SL-25 datasets. With this multilingual pretraining and strong model design, we achieve state-of-the-art results on the challenging BOBSL (BSL) dataset for both SLT and SSA. Our model also demonstrates robust zero-shot generalisation and finetuned SLT performance on How2Sign (ASL), highlighting the potential of scalable translation across different sign languages.

Youngjoon Jang, Liliane Momeni, Zifan Jiang, Joon Son Chung, G\"ul Varol, Andrew Zisserman• 2025

Related benchmarks

TaskDatasetResultRank
Sign Language TranslationHow2Sign (test)
BLEU-424.6
61
Sign Language TranslationBOBSL SENT (test)
B46.8
23
Isolated Sign Language RecognitionBOBSL-SIGN (test)
Top-1 Accuracy (Instance)0.784
4
Sign Language TranslationFLEURS ASL (devtest)
BLEU-413
2
Showing 4 of 4 rows

Other info

Follow for update