Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LRS2

Benchmarks

Task NameDataset NameSOTA ResultTrend
Visual-only Speech RecognitionLRS2 (test)
WER12.6
63
Speech RecognitionLRS2 (test)
WER1.3
49
Visual Speech RecognitionLRS2
Mean WER14.6
45
Audio-visual Speech RecognitionLRS2 (test)
WER1.3
34
Audio-visual speech separationLRS2-2Mix (test)
SI-SNRi16
33
Lip ReadingLRS2 (test)
WER22.6
28
Automatic Speech RecognitionLRS2-2Mix (test)
WER17.74
18
Speech EnhancementLRS2 mixed with VGGSound noises (test)
PESQ3.22
18
Talking Face GenerationLRS2 (test)
SSIM1
18
Audio-Visual Speech SeparationLRS2 (test)
SDRi12.46
14
Visual Speech RecognitionLRS2 v0.4 (test)
WER3.7
14
English TranscriptionLRS2 clean (test)
ASR WER1.3
12
Audio-visual speech separationLRS2 2Mix
SDRi15.9
12
Audio-Visual Speech RecognitionLRS2 (clean)
WER2.2
12
Automatic Visual Speech RecognitionLRS2 clean (test)
WER2.2
12
Lip-syncingLRS2 1 (test)
LSE-D6.386
12
Audio-Visual Speech RecognitionLRS2 50% visual occlusion (test)
WER (Overall)6.4
10
Speech SeparationLRS2-2Mix (test)
GPU RTF (s) (Forward)0.0118
10
Talking Face GenerationLRS2
ID-SIM1
8
Audio-visual speech separationLRS2-3Mix (test)
SI-SNRi13.7
8
ASR Error CorrectionLRS2 (test)
WER2.6
8
speaker separationLRS2 synthetic (test)
SDR14.2
7
Audio Speech RecognitionLRS2 v0.4 (test)
WER3.9
7
Talking Head GenerationLRS2 35
LSE-C7.287
6
Lip synchronisationLRS2 3 (test)
Acc (5 frames)88.1
6
Showing 25 of 41 rows