Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment

About

Automated phoneme-level pronunciation assessment is vital for scalable speech therapy and language learning, yet validated tools for Arabic remain scarce. We present Harf-Speech, a modular system scoring Arabic pronunciation at the phoneme level on a clinical scale. It combines an MSA phonetizer, a fine-tuned speech-to-phoneme model, Levenshtein alignment, and a blended scorer using longest common subsequence and edit-distance metrics. We fine-tune three ASR architectures on Arabic phoneme data and benchmark them with zero-shot multimodal models; the best, OmniASR-CTC-1B-v2, achieves 8.92\% phoneme error rate. Three certified speech-language pathologists independently scored 40 utterances for clinical validation. Harf-Speech attains a Pearson correlation of 0.791 and ICC(2,1) of 0.659 with mean expert scores, outperforming existing end-to-end assessment frameworks. These results show Harf-Speech yields clinically aligned, interpretable scores comparable to inter-rater expert agreement.

Asif Azad, MD Sadik Hossain Shanto, Mohammad Sadat Hossain, Bdour Alwuqaysi, Sabri Boughorbel, Yahya Bokhari, Abdulrhman Aljouie, Ayah Othman Sindi, Ehsan Hoque• 2026

Related benchmarks

Task	Dataset	Result
Phoneme-level Arabic speech assessment	Arabic clinical speech dataset Inter-SLP Agreement	PCC0.696	5
Phoneme Recognition	IqraEval 500 samples (val)	--	5
Phoneme-level Arabic speech assessment	Arabic clinical speech dataset Target: SLP 1	PCC0.798	2
Phoneme-level Arabic speech assessment	Arabic clinical speech dataset Target: SLP 3	PCC0.795	2
Phoneme-level Arabic speech assessment	Arabic clinical speech dataset Target: Mean SLP	PCC0.791	2

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord