XPPG-PCA: Reference-free automatic speech severity evaluation with principal components
About
Reliably evaluating the severity of a speech pathology is crucial in healthcare. However, the current reliance on expert evaluations by speech-language pathologists presents several challenges: while their assessments are highly skilled, they are also subjective, time-consuming, and costly, which can limit the reproducibility of clinical studies and place a strain on healthcare resources. While automated methods exist, they have significant drawbacks. Reference-based approaches require transcriptions or healthy speech samples, restricting them to read speech and limiting their applicability. Existing reference-free methods are also flawed; supervised models often learn spurious shortcuts from data, while handcrafted features are often unreliable and restricted to specific speech tasks. This paper introduces XPPG-PCA (x-vector phonetic posteriorgram principal component analysis), a novel, unsupervised, reference-free method for speech severity evaluation. Using three Dutch oral cancer datasets, we demonstrate that XPPG-PCA performs comparably to, or exceeds established reference-based methods. Our experiments confirm its robustness against data shortcuts and noise, showing its potential for real-world clinical use. Taken together, our results show that XPPG-PCA provides a robust, generalizable solution for the objective assessment of speech pathology, with the potential to significantly improve the efficiency and reliability of clinical evaluations across a range of disorders. An open-source implementation is available.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Speech severity evaluation | NKI-OC-VC nspk 15 (total) | Pearson Correlation Coefficient (r)0.9 | 20 | |
| Speech severity evaluation | NKI-SpeechRT nspk 54 (total) | Pearson Correlation Coefficient0.8414 | 20 | |
| Speech severity evaluation | NKI-RUG-UMCG nspk 8 (total) | Pearson Correlation Coefficient (r)0.9598 | 20 |