Ara-Best-RQ: Multi Dialectal Arabic SSL
About
We present Ara-BEST-RQ, a family of self-supervised learning (SSL) models specifically designed for multi-dialectal Arabic speech processing. Leveraging 5,640 hours of crawled Creative Commons speech and combining it with publicly available datasets, we pre-train conformer-based BEST-RQ models up to 600M parameters. Our models are evaluated on dialect identification (DID) and automatic speech recognition (ASR) tasks, achieving state-of-the-art performance on the former while using fewer parameters than competing models. We demonstrate that family-targeted pre-training on Arabic dialects significantly improves downstream performance compared to multilingual or monolingual models trained on non-Arabic data. All models, code, and pre-processed datasets will be publicly released to support reproducibility and further research in Arabic speech technologies.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automatic Speech Recognition | TARIC-SLU (test) | WER21.14 | 6 | |
| Automatic Speech Recognition | Common Voice Arabic 19.0 (test) | WER18.59 | 6 | |
| Automatic Speech Recognition | MGB-3 (test) | WER28.78 | 6 | |
| Automatic Speech Recognition | MGB-5 (test) | WER54.18 | 6 | |
| Dialect Identification | ADI-20 (val) | Accuracy97.21 | 4 | |
| Dialect Identification | ADI 20 (test) | Accuracy96.02 | 4 |