BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
About
Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use of weakly-aligned subtitles for broadcast footage together with a keyword spotting method to automatically localise sign-instances for a vocabulary of 1,000 signs in 1,000 hours of video. We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data - the result is the BSL-1K dataset, a collection of British Sign Language (BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks - we exceed the state of the art on both the MSASL and WLASL benchmarks. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Isolated Sign Language Recognition | WLASL 100 | Per-instance Top-1 Acc46.82 | 46 | |
| Isolated Sign Language Recognition | WLASL 300 | Top-1 Accuracy (Instance)53.72 | 28 | |
| Isolated Sign Language Recognition | MSASL 1000 | Per-class Top-1 Acc61.55 | 25 | |
| Sign Language Recognition | WLASL (test) | Top-1 Accuracy46.9 | 17 | |
| Sign Language Recognition | WLASL2000 v1.0 (test) | Per-instance Top-1 Acc0.4682 | 12 | |
| Isolated Sign Language Recognition | WLASL | Per-instance Top-1 Acc46.82 | 9 | |
| Isolated Sign Language Recognition | MSASL | Top-1 Acc (Class)61.55 | 8 | |
| Sign Language Recognition | MSASL 1000 (test) | Per-instance Top-1 Acc64.71 | 8 | |
| Sign Recognition | BSL-1K 37K_Rec (test) | Per-Instance Top-1 Acc40.8 | 7 | |
| Sign Recognition | WLASL (test) | Per-Instance Top-1 Acc46.82 | 3 |