Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

XPPG-PCA: Reference-free automatic speech severity evaluation with principal components

About

Reliably evaluating the severity of a speech pathology is crucial in healthcare. However, the current reliance on expert evaluations by speech-language pathologists presents several challenges: while their assessments are highly skilled, they are also subjective, time-consuming, and costly, which can limit the reproducibility of clinical studies and place a strain on healthcare resources. While automated methods exist, they have significant drawbacks. Reference-based approaches require transcriptions or healthy speech samples, restricting them to read speech and limiting their applicability. Existing reference-free methods are also flawed; supervised models often learn spurious shortcuts from data, while handcrafted features are often unreliable and restricted to specific speech tasks. This paper introduces XPPG-PCA (x-vector phonetic posteriorgram principal component analysis), a novel, unsupervised, reference-free method for speech severity evaluation. Using three Dutch oral cancer datasets, we demonstrate that XPPG-PCA performs comparably to, or exceeds established reference-based methods. Our experiments confirm its robustness against data shortcuts and noise, showing its potential for real-world clinical use. Taken together, our results show that XPPG-PCA provides a robust, generalizable solution for the objective assessment of speech pathology, with the potential to significantly improve the efficiency and reliability of clinical evaluations across a range of disorders. An open-source implementation is available.

Bence Mark Halpern, Thomas B. Tienkamp, Teja Rebernik, Rob J.J.H. van Son, Sebastiaan A.H.J. de Visscher, Max J.H. Witjes, Defne Abur, Tomoki Toda• 2025

Related benchmarks

TaskDatasetResultRank
Speech severity evaluationNKI-OC-VC nspk 15 (total)
Pearson Correlation Coefficient (r)0.9
20
Speech severity evaluationNKI-SpeechRT nspk 54 (total)
Pearson Correlation Coefficient0.8414
20
Speech severity evaluationNKI-RUG-UMCG nspk 8 (total)
Pearson Correlation Coefficient (r)0.9598
20
Showing 3 of 3 rows

Other info

Follow for update