Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LRS2

Benchmarks

Task NameDataset NameSOTA ResultTrend
Visual-only Speech RecognitionLRS2 (test)
WER12.6
77
Visual Speech RecognitionLRS2
Mean WER14.6
49
Speech RecognitionLRS2 (test)
WER1.3
49
Lip ReadingLRS2 (test)
WER14.6
39
Audio-visual Speech RecognitionLRS2 (test)
WER1.3
34
Audio-visual speech separationLRS2-2Mix (test)
SI-SNRi16
33
Audio-Visual Speech SeparationLRS2 (test)
SDRi16.9
23
Audio-Visual Target Speaker ExtractionLRS2 2-mix (test)
DNSMOS3.16
22
Automatic Speech RecognitionLRS2-2Mix (test)
WER17.74
18
Speech EnhancementLRS2 mixed with VGGSound noises (test)
PESQ3.22
18
Talking Face GenerationLRS2 (test)
SSIM1
18
Audio-Visual Speech RecognitionLRS2 (clean)
WER2.2
16
Visual Speech RecognitionLRS2 v0.4 (test)
WER3.7
14
English TranscriptionLRS2 clean (test)
ASR WER1.3
12
Audio-visual speech separationLRS2 2Mix
SDRi15.9
12
Automatic Visual Speech RecognitionLRS2 clean (test)
WER2.2
12
Lip-syncingLRS2 1 (test)
LSE-D6.386
12
Video-to-SpeechLRS2 (test)
WER (Word Error Rate)8.93
10
Audio-Visual Speech RecognitionLRS2 50% visual occlusion (test)
WER (Overall)6.4
10
Speech SeparationLRS2-2Mix (test)
GPU RTF (s) (Forward)0.0118
10
Talking Face GenerationLRS2
ID-SIM1
8
Audio-visual speech separationLRS2-3Mix (test)
SI-SNRi13.7
8
ASR Error CorrectionLRS2 (test)
WER2.6
8
Human Speech GenerationLRS2 (test)
LSE-D7.83
7
speaker separationLRS2 synthetic (test)
SDR14.2
7
Showing 25 of 57 rows