Visual Speech Recognition

Benchmarks

Dataset Name	SOTA Method	Metric
LRS3 (test)	Llama-AVSR	WER0.77	240	1mo ago
LRS3 High-Resource, 433h labelled v1 (test)		WER0.009	80	4mo ago
LRS3	Auto-AVSR	WER0.009	63	4mo ago
LRS2	AutoAVSR	Mean WER14.6	49	4mo ago
LRS3 Low-Resource 30h labelled v1 (test)	USR	WER0.024	34	4mo ago
LRS3 30h labeled low-resource (test)	UASR-LLM-L	WER25.3	28	4mo ago
DVSpeaker (Cross-scene)	NeuroLip	SI (45°)71.68	21	3mo ago
DVSpeaker Matched-scene	Get	SI Accuracy (0°)100	21	3mo ago
LRS3 low-resource (test)		WER19.3	20	4mo ago
LRS3 high-resource (test)	RAVEn w/ self-training	WER23.1	16	4mo ago
WildVSR	Auto-AVSR	WER38.6	15	4mo ago
LRS2 v0.4 (test)	Ours (raw A + V)	WER3.7	14	4mo ago
CMLR (Seen)	Cascade-Free Mandarin VSR	CER20.38	12	4mo ago
LSVSR (test)	Audio-Ph	Word Error Rate18.3	10	4mo ago
LRS3 v0.4 (test)	Ours (raw A + V)	WER2.3	9	4mo ago
CMLR (Unseen)	Cascade-Free Mandarin VSR	CER38.23	8	4mo ago
CMLR (test)	Lipnet	Inference Latency (ms)52.3	8	4mo ago
CMLR	VSR model with prediction-based auxiliary tasks	Best CER8	7	4mo ago
CNVSRC-Multi Mandarin (dev)	VALLR-Pin	CER24.1	6	4mo ago
LRW	SyncVSR	Top-1 Accuracy80.3	5	4mo ago
LRS3 v0.0 (test)	Ours (raw A + V)	WER1.2	5	4mo ago
Self-Collected Dataset Mandarin (test)	VALLR-Pin	CER32.22	4	4mo ago
CMU-MOSEAS-Spanish (CMes)	CM-seq2seq	Best Score58.1	4	4mo ago
CMU-MOSEAS-Portuguese (CMpt) (test)	VSR model with prediction-based auxiliary tasks	Mean WER51.6	4	4mo ago
Multilingual TEDx-Spanish (MTes) (test)	VSR model with prediction-based auxiliary tasks	Mean WER56.6	4	4mo ago

Showing 25 of 30 rows