Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign Language Understanding

About

Hand gesture serves as a crucial role during the expression of sign language. Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource and suffer limited interpretability. In this paper, we propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated. In our framework, the hand pose is regarded as a visual token, which is derived from an off-the-shelf detector. Each visual token is embedded with gesture state and spatial-temporal position encoding. To take full advantage of current sign data resource, we first perform self-supervised learning to model its statistics. To this end, we design multi-level masked modeling strategies (joint, frame and clip) to mimic common failure detection cases. Jointly with these masked modeling strategies, we incorporate model-aware hand prior to better capture hierarchical context over the sequence. After the pre-training, we carefully design simple yet effective prediction heads for downstream tasks. To validate the effectiveness of our framework, we perform extensive experiments on three main SLU tasks, involving isolated and continuous sign language recognition (SLR), and sign language translation (SLT). Experimental results demonstrate the effectiveness of our method, achieving new state-of-the-art performance with a notable gain.

Hezhen Hu, Weichao Zhao, Wengang Zhou, Houqiang Li• 2023

Related benchmarks

TaskDatasetResultRank
Continuous Sign Language RecognitionPHOENIX 2014 (dev)
Word Error Rate19.9
188
Continuous Sign Language RecognitionPHOENIX-2014 (test)
WER20
185
Sign Language TranslationPHOENIX-2014T (test)
BLEU-425.7
159
Sign Language TranslationPHOENIX-2014T (dev)
BLEU-4 Score24.95
111
Continuous Sign Language RecognitionPHOENIX14-T (dev)
WER18.8
75
Isolated Sign Language RecognitionWLASL 100
Per-instance Top-1 Acc84.11
46
Continuous Sign Language RecognitionPHOENIX-2014T (test)
WER19.9
43
Sign Language RecognitionPHOENIX-2014T (test)
WER0.199
41
Sign Language RecognitionPHOENIX 2014 (dev)
WER18.8
32
Isolated Sign Language RecognitionWLASL 300
Top-1 Accuracy (Instance)78.44
28
Showing 10 of 22 rows

Other info

Follow for update