Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach

About

Mispronunciation Detection and Diagnosis (MDD) is crucial for language learning and speech therapy. Unlike conventional methods that require scoring models or training phoneme-level models, we propose a novel training-free framework that leverages retrieval techniques with a pretrained Automatic Speech Recognition model. Our method avoids phoneme-specific modeling or additional task-specific training, while still achieving accurate detection and diagnosis of pronunciation errors. Experiments on the L2-ARCTIC dataset show that our method achieves a superior F1 score of 69.60% while avoiding the complexity of model training.

Huu Tuong Tu, Ha Viet Khanh, Tran Tien Dat, Vu Huan, Thien Van Luong, Nguyen Tien Cuong, Nguyen Thi Thu Trang• 2025

Related benchmarks

Task	Dataset	Result
Mispronunciation Detection	L2-ARCTIC (test)	F1 Score69.6	20
Mispronunciation Diagnosis	L2-ARCTIC (test)	EDR37.77	14
Phoneme Recognition	L2-ARCTIC (test)	Phoneme Error Rate (PER)104.1	14
Mispronunciation Detection and Diagnosis	L2-ARCTIC 6-speaker subset (test)	F1 Score69.6	7
Automatic Speech Recognition	L2-ARCTIC 6-speaker subset (test)	PER104.1	5

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord