SignX: Continuous Sign Recognition in Compact Pose-Rich Latent Space

About

The complexity of Sign Language (SL) data processing brings many challenges. The current approach to recognition of SL signs aims to translate RGB sign language videos through pose information into Word-based ID Glosses, which serve to uniquely identify signs. This paper proposes SignX, a novel framework for continuous sign language recognition (SLR) in compact pose-rich latent space. First, we construct a unified latent representation that encodes heterogeneous pose formats (SMPLer-X, DWPose, Mediapipe, PrimeDepth, and Sapiens Segmentation) into a compact, information-dense space. Second, we train a ViT-based Video-to-Pose module to extract this latent representation directly from raw videos. Finally, we develop a temporal modeling and sequence refinement method that operates entirely in this latent space. This multi-stage design achieves end-to-end SLR while significantly reducing computational consumption. Experimental results demonstrate that SignX achieves SOTA accuracy on continuous SLR and Translation task, delivering nearly a 50-fold acceleration over pixel-space baselines.

Sen Fang, Yalin Feng, Chunyu Sui, Hongbin Zhong, Yanxin Zhang, Hongwei Yi, Hezhen Hu, Dimitris N. Metaxas• 2025

Related benchmarks

Task	Dataset	Result
Sign Language Translation	PHOENIX-2014T (test)	BLEU-429.91	191
Sign Language Translation	CSL-Daily (test)	BLEU-428.58	187
Sign Language Translation	PHOENIX-2014T (dev)	BLEU-4 Score30.08	156
Sign Language Translation	CSL-Daily (dev)	BLEU-428.75	125
Isolated Sign Language Recognition	WLASL 2000	P-I68.29	30
Continuous Sign Language Recognition	RWTH 2014-T	WER18.6	8
Continuous Sign Language Recognition	CSL-Daily	WER24.3	7
Sign Language Recognition (Sign2Gloss)	ASLLRP (dev)	ROUGE56.65	4
Sign Language Recognition (Sign2Gloss)	ASLLRP (test)	ROUGE56.48	4

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord