Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis

About

Mispronunciation Detection and Diagnosis (MDD) requires modeling fine-grained acoustic deviations. However, current ASR-derived MDD systems often face inherent limitations. In particular, CTC-based models favor sequence-level alignments that neglect transient mispronunciation cues, while explicit canonical priors bias predictions toward intended targets. To address these bottlenecks, we propose a prompt-free framework decoupling acoustic fidelity from canonical guidance. First, we introduce CROTTC, an acoustic model enforcing monotonic, frame-level alignment to accurately capture pronunciation deviations. Second, we implicitly inject mispronunciation information via the IF strategy under the knowledge transfer principle. Experiments show CROTTC-IF achieves a 71.77% F1-score on L2-ARCTIC and 71.70% F1-score on the Iqra'Eval2 leaderboard. With empirical analysis, we demonstrate that decoupling acoustics from explicit priors provides highly robust MDD.

Haopeng Geng, Longfei Yang, Xi Chen, Haitong Sun, Daisuke Saito, Nobuaki Minematsu• 2026

Related benchmarks

TaskDatasetResultRank
Mispronunciation DetectionL2-ARCTIC (test)
F1 Score71.77
20
Mispronunciation DiagnosisL2-ARCTIC (test)
EDR21.98
14
Phoneme RecognitionL2-ARCTIC (test)
Phoneme Error Rate (PER)15.42
14
Mispronunciation Detection and DiagnosisERJ (test)
F1 Score89.27
6
Mispronunciation Detection and DiagnosisSO762 (test)
F1 Score57.16
6
Mispronunciation Detection and DiagnosisIqra’Eval2 Leaderboard (test)
F1-score71.7
5
Showing 6 of 6 rows

Other info

Follow for update