MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

About

We introduce MuAViC, a multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation providing 1200 hours of audio-visual speech in 9 languages. It is fully transcribed and covers 6 English-to-X translation as well as 6 X-to-English translation directions. To the best of our knowledge, this is the first open benchmark for audio-visual speech-to-text translation and the largest open benchmark for multilingual audio-visual speech recognition. Our baseline results show that MuAViC is effective for building noise-robust speech recognition and translation models. We make the corpus available at https://github.com/facebookresearch/muavic.

Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang• 2023

Related benchmarks

Task	Dataset	Result
Audio-visual speech-to-text translation	MuAViC (test)	BLEU (EL->EN)4.2	23
Speech Recognition	MuAViC (test)	Arabic Score82.2	9
Visual Speech Translation	MuAViC	En->It Score15.1	6

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord