Improving Gloss-free Sign Language Translation by Reducing Representation Density
About
Gloss-free sign language translation (SLT) aims to develop well-performing SLT systems with no requirement for the costly gloss annotations, but currently still lags behind gloss-based approaches significantly. In this paper, we identify a representation density problem that could be a bottleneck in restricting the performance of gloss-free SLT. Specifically, the representation density problem describes that the visual representations of semantically distinct sign gestures tend to be closely packed together in feature space, which makes gloss-free methods struggle with distinguishing different sign gestures and suffer from a sharp performance drop. To address the representation density problem, we introduce a simple but effective contrastive learning strategy, namely SignCL, which encourages gloss-free models to learn more discriminative feature representation in a self-supervised manner. Our experiments demonstrate that the proposed SignCL can significantly reduce the representation density and improve performance across various translation frameworks. Specifically, SignCL achieves a significant improvement in BLEU score for the Sign Language Transformer and GFSLT-VLP on the CSL-Daily dataset by 39% and 46%, respectively, without any increase of model parameters. Compared to Sign2GPT, a state-of-the-art method based on large-scale pre-trained vision and language models, SignCL achieves better performance with only 35% of its parameters. Implementation and Checkpoints are available at https://github.com/JinhuiYE/SignCL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sign Language Translation | PHOENIX-2014T (test) | BLEU-423.46 | 159 | |
| Sign Language Translation | CSL-Daily (test) | BLEU-416.16 | 99 | |
| Sign Language Recognition | PHOENIX-2014T (test) | WER0.6333 | 41 | |
| Sign Language Translation | CSL-Daily | BLEU Score10.35 | 9 | |
| Representation Density | PHOENIX-2014T | SDR81.3 | 5 | |
| Sign Language Translation | PHOENIX-2014T | Joint-SLT14.76 | 5 | |
| Sign Language Recognition | CSL-Daily | WER80.71 | 3 | |
| Sign Language Representation Analysis | CSL-Daily | SDR68.39 | 3 |