Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AfroScope: A Framework for Studying the Linguistic Landscape of Africa

About

Language Identification (LID) is the task of determining the language of a given text and is a fundamental preprocessing step that affects the reliability of downstream NLP applications. While recent work has expanded LID coverage for African languages, existing approaches remain limited in (i) the number of supported languages and (ii) their ability to make fine-grained distinctions among closely related varieties. We introduce AfroScope, a unified framework for African LID that includes AfroScope-Data, a dataset covering 713 African languages, and AfroScope-Models, a suite of strong LID models with broad language coverage. To better distinguish highly confusable languages, we propose a hierarchical classification approach that leverages Mirror-Serengeti, a specialized embedding model targeting 29 closely related or geographically proximate languages. This approach improves macro F1 by 4.55 on this confusable subset compared to our best base model. Finally, we analyze cross linguistic transfer and domain effects, offering guidance for building robust African LID systems. We position African LID as an enabling technology for large scale measurement of Africas linguistic landscape in digital text and release AfroScope-Data and AfroScope-Models publicly.

Sang Yun Kwon, AbdelRahim Elmadany, Muhammad Abdul-Mageed• 2026

Related benchmarks

TaskDatasetResultRank
Hierarchical classificationAfroScope-Data confusable (test)
Macro F198
36
Language IdentificationAfroScope High resource
Macro-F1100
16
Language IdentificationAfroScope-Data Mid resource
Macro F198.67
8
Language IdentificationAfroscope
Macro-F197.83
5
Language IdentificationBLOOM
Macro F195.76
5
Language IdentificationFineWeb2
Macro F194.52
5
Language IdentificationMafand
Macro F193.54
5
Language IdentificationMCS-350
Macro F1 Score70.38
5
Language IdentificationSmol
Macro F190.02
5
Language IdentificationUDHR
Macro F189.68
5
Showing 10 of 11 rows

Other info

Follow for update