Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SlowFast Network for Continuous Sign Language Recognition

About

The objective of this work is the effective extraction of spatial and dynamic features for Continuous Sign Language Recognition (CSLR). To accomplish this, we utilise a two-pathway SlowFast network, where each pathway operates at distinct temporal resolutions to separately capture spatial (hand shapes, facial expressions) and dynamic (movements) information. In addition, we introduce two distinct feature fusion methods, carefully designed for the characteristics of CSLR: (1) Bi-directional Feature Fusion (BFF), which facilitates the transfer of dynamic semantics into spatial semantics and vice versa; and (2) Pathway Feature Enhancement (PFE), which enriches dynamic and spatial representations through auxiliary subnetworks, while avoiding the need for extra inference time. As a result, our model further strengthens spatial and dynamic representations in parallel. We demonstrate that the proposed framework outperforms the current state-of-the-art performance on popular CSLR datasets, including PHOENIX14, PHOENIX14-T, and CSL-Daily.

Junseok Ahn, Youngjoon Jang, Joon Son Chung• 2023

Related benchmarks

TaskDatasetResultRank
Continuous Sign Language RecognitionCSL-Daily (dev)
Word Error Rate (WER)25.5
98
Continuous Sign Language RecognitionCSL-Daily (test)
WER24.9
91
Continuous Sign Language RecognitionPHOENIX14-T (dev)
WER17.7
75
Continuous Sign Language RecognitionPHOENIX-2014T (test)
WER18.7
43
Continuous Sign Language RecognitionPhoenix14 (test)
WER18.3
39
Continuous Sign Language RecognitionPhoenix14 (dev)
WER18
29
Continuous Sign Language RecognitionPHOENIX 14 (dev test)
WER (Dev)18
16
Continuous Sign Language RecognitionPHOENIX14-T (dev test)
WER (Dev)17.7
14
Showing 8 of 8 rows

Other info

Follow for update