Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation

About

Automatic Sign Language Translation requires the integration of both computer vision and natural language processing to effectively bridge the communication gap between sign and spoken languages. However, the deficiency in large-scale training data to support sign language translation means we need to leverage resources from spoken language. We introduce, Sign2GPT, a novel framework for sign language translation that utilizes large-scale pretrained vision and language models via lightweight adapters for gloss-free sign language translation. The lightweight adapters are crucial for sign language translation, due to the constraints imposed by limited dataset sizes and the computational requirements when training with long sign videos. We also propose a novel pretraining strategy that directs our encoder to learn sign representations from automatically extracted pseudo-glosses without requiring gloss order information or annotations. We evaluate our approach on two public benchmark sign language translation datasets, namely RWTH-PHOENIX-Weather 2014T and CSL-Daily, and improve on state-of-the-art gloss-free translation performance with a significant margin.

Ryan Wong, Necati Cihan Camgoz, Richard Bowden• 2024

Related benchmarks

Task	Dataset	Result
Sign Language Translation	PHOENIX-2014T (test)	BLEU-422.52	183
Sign Language Translation	CSL-Daily (test)	BLEU-415.4	158
Sign Language Translation	CSL-Daily (dev)	BLEU-47.06	115
Sign Language Translation	PHOENIX14T (test)	BLEU-422.52	82
Sign Language Translation	CSL-Daily v1 (test)	ROUGE42.36	25
Sign Language Translation	BOBSL SENT (test)	B40.9	23
Sign Language Translation	CSL-Daily	ROUGE-L42.36	19
Sign Language Translation	PHOENIX14T	ROUGE48.9	18
Pseudo-gloss alignment	BSLCorpus Fair	LCS (%)57.26	4
Pseudo-gloss alignment	BSLCorpus (Poor)	LCS%34.52	4

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord