Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multivariate Probabilistic Assessment of Speech Quality

About

The mean opinion score (MOS) is a standard metric for assessing speech quality, but its singular focus fails to identify specific distortions when low scores are observed. The NISQA dataset addresses this limitation by providing ratings across four additional dimensions: noisiness, coloration, discontinuity, and loudness, alongside MOS. In this paper, we extend the explored univariate MOS estimation to a multivariate framework by modeling these dimensions jointly using a multivariate Gaussian distribution. Our approach utilizes Cholesky decomposition to predict covariances without imposing restrictive assumptions and extends probabilistic affine transformations to a multivariate context. Experimental results show that our model performs on par with state-of-the-art methods in point estimation, while uniquely providing uncertainty and correlation estimates across speech quality dimensions. This enables better diagnosis of poor speech quality and informs targeted improvements.

Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Sch\"uldt, Saikat Chatterjee• 2025

Related benchmarks

TaskDatasetResultRank
Audio Content Usefulness (CU) AssessmentAES-Natural
SRCC0.961
9
Audio Production Complexity (PC) AssessmentAES-Natural
SRCC0.947
9
Audio Production Quality (PQ) AssessmentAES-Natural
SRCC0.942
9
Audio Content Enjoyment (CE) AssessmentAES-Natural
SRCC0.938
9
Showing 4 of 4 rows

Other info

Follow for update