AfriHuBERT: A self-supervised speech representation model for African languages

About

In this work, we present AfriHuBERT, an extension of mHuBERT-147, a compact self-supervised learning (SSL) model pretrained on 147 languages. While mHuBERT-147 covered 16 African languages, we expand this to 1,226 through continued pretraining on 10K+ hours of speech data from diverse sources, benefiting an African population of over 600M. We evaluate AfriHuBERT on two key speech tasks, Spoken Language Identification (SLID) and Automatic Speech Recognition (ASR), using the FLEURS benchmark. Our results show a +3.6% F1 score improvement for SLID and a -2.1% average Word Error Rate (WER) reduction for ASR over mHuBERT-147, and demonstrates competitiveness with larger SSL models such as MMS and XEUS. Further analysis shows that ASR models trained on AfriHuBERT exhibit improved cross-corpus generalization and are competitive in extremely low-resource ASR scenarios.

Jesujoba O. Alabi, Xuechen Liu, Dietrich Klakow, Junichi Yamagishi• 2024

Related benchmarks

Task	Dataset	Result	Rank
Language Identification	FLEURS SSA 102 (test)	Accuracy93.5		8
Automatic Speech Recognition	FLEURS SSA 102 (test)	AFR CER12		5

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord