Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Speech Quality Assessment through MOS using Non-Matching References

About

Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way to assess the quality of speech signals. However, several recent attempts to automatically estimate MOS using deep learning approaches lack robustness and generalization capabilities, limiting their use in real-world applications. In this work, we present a novel framework, NORESQA-MOS, for estimating the MOS of a speech signal. Unlike prior works, our approach uses non-matching references as a form of conditioning to ground the MOS estimation by neural networks. We show that NORESQA-MOS provides better generalization and more robust MOS estimation than previous state-of-the-art methods such as DNSMOS and NISQA, even though we use a smaller training set. Moreover, we also show that our generic framework can be combined with other learning methods such as self-supervised learning and can further supplement the benefits from these methods.

Pranay Manocha, Anurag Kumar• 2022

Related benchmarks

TaskDatasetResultRank
Speech Quality AssessmentVoiceMOS 1 (test)
SC0.87
5
Speech Quality AssessmentVoiceMOS 2 (test)
SC Score0.9
5
Speech Quality AssessmentVoiceMOS (test 2)
RMSE0.39
5
Speech Quality AssessmentVoiceMOS (test 1)
RMSE0.42
5
Speech Quality AssessmentP23 (Experiment 1)
RMSE0.35
5
Speech Quality AssessmentP23 EXP3
SC0.48
5
Speech Quality AssessmentTENCENT-Rev
SC0.2
5
Speech Quality AssessmentNISQA LiveTalk (test)
RMSE0.65
5
Speech Quality AssessmentP23 Experiment 3
RMSE0.51
5
Speech Quality AssessmentNISQA (test)
Quality Score (SC)0.58
5
Showing 10 of 19 rows

Other info

Follow for update