Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment

About

Automated phoneme-level pronunciation assessment is vital for scalable speech therapy and language learning, yet validated tools for Arabic remain scarce. We present Harf-Speech, a modular system scoring Arabic pronunciation at the phoneme level on a clinical scale. It combines an MSA phonetizer, a fine-tuned speech-to-phoneme model, Levenshtein alignment, and a blended scorer using longest common subsequence and edit-distance metrics. We fine-tune three ASR architectures on Arabic phoneme data and benchmark them with zero-shot multimodal models; the best, OmniASR-CTC-1B-v2, achieves 8.92\% phoneme error rate. Three certified speech-language pathologists independently scored 40 utterances for clinical validation. Harf-Speech attains a Pearson correlation of 0.791 and ICC(2,1) of 0.659 with mean expert scores, outperforming existing end-to-end assessment frameworks. These results show Harf-Speech yields clinically aligned, interpretable scores comparable to inter-rater expert agreement.

Asif Azad, MD Sadik Hossain Shanto, Mohammad Sadat Hossain, Bdour Alwuqaysi, Sabri Boughorbel, Yahya Bokhari, Abdulrhman Aljouie, Ayah Othman Sindi, Ehsan Hoque• 2026

Related benchmarks

TaskDatasetResultRank
Phoneme-level Arabic speech assessmentArabic clinical speech dataset Inter-SLP Agreement
PCC0.696
5
Phoneme RecognitionIqraEval 500 samples (val)--
5
Phoneme-level Arabic speech assessmentArabic clinical speech dataset Target: SLP 1
PCC0.798
2
Phoneme-level Arabic speech assessmentArabic clinical speech dataset Target: SLP 3
PCC0.795
2
Phoneme-level Arabic speech assessmentArabic clinical speech dataset Target: Mean SLP
PCC0.791
2
Showing 5 of 5 rows

Other info

Follow for update