Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment

About

Evaluating speech generation still relies heavily on human judgments, such as Mean Opinion Score (MOS), which are expensive, subjective, and difficult to reproduce at scale. While a few recent studies have begun to explore AudioLLM-based judge models, existing efforts typically target only a narrow set of scenarios (e.g., utterance-level quality or single-turn dialogue) and provide limited coverage of diverse speech generation tasks and evaluation dimensions. In this work, we propose UniSRM, a unified speech reward model that can support multi-dimensional, interpretable reward signals with reliable reasoning. To support training and evaluation, we introduce UniSRM-Data and UniSRM-Bench, covering speech evaluation tasks from utterance-level quality to context-level coherence. Based on this dataset, we present the unified speech reward model, UniSRM, with a two-stage pipeline that enables reasoning-based fine-grained assessment. Furthermore, we introduce Reasoning-Consistent Rewards to improve the reliability of the reasoning process. Experiments show that UniSRM delivers more reliable and human-aligned judgments across a broad range of speech evaluation tasks, offering a practical foundation for scalable and unified evaluation of speech quality.

Yuanyuan Wang, Dongchao Yang, Yayue Deng, Zhiyong Wu, Yiwen Guo, Helen Meng, Xixin Wu• 2026

Related benchmarks

TaskDatasetResultRank
utterance-level pairwise preference judgementUniSRM-BENCH T1
Accuracy65.06
12
Speech Quality AssessmentBVCC--
12
multi-turn dialogue speech evaluationUniSRM-BENCH T4
Accuracy88.89
10
scenario-aware style consistency preference (Chinese)UniSRM-BENCH T3-Zh
Accuracy91.3
10
scenario-aware style consistency preference (English)UniSRM-BENCH T3-En
Accuracy85.61
10
fine-grained speech quality scoringUniSRM-BENCH T2
PCC0.551
9
Speech Quality EvaluationSOMOS Clean
PCC0.2612
5
Speech Quality EvaluationSOMOS (Full)
PCC0.2347
5
Showing 8 of 8 rows

Other info

Follow for update