Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

About

We introduce MuAViC, a multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation providing 1200 hours of audio-visual speech in 9 languages. It is fully transcribed and covers 6 English-to-X translation as well as 6 X-to-English translation directions. To the best of our knowledge, this is the first open benchmark for audio-visual speech-to-text translation and the largest open benchmark for multilingual audio-visual speech recognition. Our baseline results show that MuAViC is effective for building noise-robust speech recognition and translation models. We make the corpus available at https://github.com/facebookresearch/muavic.

Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang• 2023

Related benchmarks

TaskDatasetResultRank
Audio-visual speech-to-text translationMuAViC (test)
BLEU (EL->EN)4.2
23
Speech RecognitionMuAViC (test)
Arabic Score82.2
9
Visual Speech TranslationMuAViC
En->It Score15.1
6
Showing 3 of 3 rows

Other info

Follow for update