Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

About

Recent works in speech recognition rely either on connectionist temporal classification (CTC) or sequence-to-sequence models for character-level recognition. CTC assumes conditional independence of individual characters, whereas attention-based models can provide nonsequential alignments. Therefore, we could use a CTC loss in combination with an attention-based model in order to force monotonic alignments and at the same time get rid of the conditional independence assumption. In this paper, we use the recently proposed hybrid CTC/attention architecture for audio-visual recognition of speech in-the-wild. To the best of our knowledge, this is the first time that such a hybrid architecture architecture is used for audio-visual recognition of speech. We use the LRS2 database and show that the proposed audio-visual model leads to an 1.3% absolute decrease in word error rate over the audio-only model and achieves the new state-of-the-art performance on LRS2 database (7% word error rate). We also observe that the audio-visual model significantly outperforms the audio-based model (up to 32.9% absolute improvement in word error rate) for several different types of noise as the signal-to-noise ratio decreases.

Stavros Petridis, Themos Stafylakis, Pingchuan Ma, Georgios Tzimiropoulos, Maja Pantic• 2018

Related benchmarks

TaskDatasetResultRank
Visual-only Speech RecognitionLRS2 (test)
WER63.5
63
Speech RecognitionLRS2 (test)
WER7
49
Visual Speech RecognitionLRS2
Mean WER43.2
45
Audio-Visual Speech RecognitionLRS2 (test)
WER7
34
Lip-readingLRS2 (test)
WER63.5
28
Visual Speech RecognitionLRS2 v0.4 (test)
WER7
14
Audio-Visual Speech RecognitionLRS2 (clean)
WER7
12
Automatic Visual Speech RecognitionLRS2 clean (test)
WER7
12
English TranscriptionLRS2 clean (test)
ASR WER8.3
12
Audio Speech RecognitionLRS2 v0.4 (test)
WER8.2
7
Showing 10 of 10 rows

Other info

Follow for update