Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LRS2

Benchmarks

Task NameDataset NameSOTA ResultTrend
Visual-only Speech RecognitionLRS2 (test)
WER12.6
63
Visual Speech RecognitionLRS2
Mean WER14.6
49
Speech RecognitionLRS2 (test)
WER1.3
49
Audio-visual Speech RecognitionLRS2 (test)
WER1.3
34
Audio-visual speech separationLRS2-2Mix (test)
SI-SNRi16
33
Lip ReadingLRS2 (test)
WER22.6
28
Audio-Visual Speech SeparationLRS2 (test)
SDRi16.9
23
Audio-Visual Target Speaker ExtractionLRS2 2-mix (test)
DNSMOS3.16
22
Automatic Speech RecognitionLRS2-2Mix (test)
WER17.74
18
Speech EnhancementLRS2 mixed with VGGSound noises (test)
PESQ3.22
18
Talking Face GenerationLRS2 (test)
SSIM1
18
Audio-Visual Speech RecognitionLRS2 (clean)
WER2.2
16
Visual Speech RecognitionLRS2 v0.4 (test)
WER3.7
14
English TranscriptionLRS2 clean (test)
ASR WER1.3
12
Audio-visual speech separationLRS2 2Mix
SDRi15.9
12
Automatic Visual Speech RecognitionLRS2 clean (test)
WER2.2
12
Lip-syncingLRS2 1 (test)
LSE-D6.386
12
Audio-Visual Speech RecognitionLRS2 50% visual occlusion (test)
WER (Overall)6.4
10
Speech SeparationLRS2-2Mix (test)
GPU RTF (s) (Forward)0.0118
10
Talking Face GenerationLRS2
ID-SIM1
8
Audio-visual speech separationLRS2-3Mix (test)
SI-SNRi13.7
8
ASR Error CorrectionLRS2 (test)
WER2.6
8
speaker separationLRS2 synthetic (test)
SDR14.2
7
Audio Speech RecognitionLRS2 v0.4 (test)
WER3.9
7
Talking Head GenerationLRS2 35
LSE-C7.287
6
Showing 25 of 55 rows