Global-local Enhancement Network for NMFs-aware Sign Language Recognition
About
Sign language recognition (SLR) is a challenging problem, involving complex manual features, i.e., hand gestures, and fine-grained non-manual features (NMFs), i.e., facial expression, mouth shapes, etc. Although manual features are dominant, non-manual features also play an important role in the expression of a sign word. Specifically, many sign words convey different meanings due to non-manual features, even though they share the same hand gestures. This ambiguity introduces great challenges in the recognition of sign words. To tackle the above issue, we propose a simple yet effective architecture called Global-local Enhancement Network (GLE-Net), including two mutually promoted streams towards different crucial aspects of SLR. Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues. Moreover, due to the lack of datasets explicitly focusing on this kind of features, we introduce the first non-manual-features-aware isolated Chinese sign language dataset~(NMFs-CSL) with a total vocabulary size of 1,067 sign words in daily life. Extensive experiments on NMFs-CSL and SLR500 datasets demonstrate the effectiveness of our method.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Isolated Sign Language Recognition | NMFs-CSL (Total) | Top-1 Acc69 | 24 | |
| Isolated Sign Language Recognition | NMFs-CSL (Confusing) | Top-1 Acc50.6 | 24 | |
| Sign Language Recognition | SLR500 | Accuracy96.8 | 18 | |
| Isolated Sign Language Recognition | NMFS-CSL Normal (test) | Top-1 Acc93.6 | 14 | |
| Isolated Sign Language Recognition | NMFs-CSL Normal | Top-1 Acc93.6 | 10 | |
| Sign Language Recognition | NMFs-CSL | Top-1 Acc69 | 9 |