UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment

About

Evaluating speech generation still relies heavily on human judgments, such as Mean Opinion Score (MOS), which are expensive, subjective, and difficult to reproduce at scale. While a few recent studies have begun to explore AudioLLM-based judge models, existing efforts typically target only a narrow set of scenarios (e.g., utterance-level quality or single-turn dialogue) and provide limited coverage of diverse speech generation tasks and evaluation dimensions. In this work, we propose UniSRM, a unified speech reward model that can support multi-dimensional, interpretable reward signals with reliable reasoning. To support training and evaluation, we introduce UniSRM-Data and UniSRM-Bench, covering speech evaluation tasks from utterance-level quality to context-level coherence. Based on this dataset, we present the unified speech reward model, UniSRM, with a two-stage pipeline that enables reasoning-based fine-grained assessment. Furthermore, we introduce Reasoning-Consistent Rewards to improve the reliability of the reasoning process. Experiments show that UniSRM delivers more reliable and human-aligned judgments across a broad range of speech evaluation tasks, offering a practical foundation for scalable and unified evaluation of speech quality.

Yuanyuan Wang, Dongchao Yang, Yayue Deng, Zhiyong Wu, Yiwen Guo, Helen Meng, Xixin Wu• 2026

Related benchmarks

Task	Dataset	Result
utterance-level pairwise preference judgement	UniSRM-BENCH T1	Accuracy65.06	12
Speech Quality Assessment	BVCC	--	12
multi-turn dialogue speech evaluation	UniSRM-BENCH T4	Accuracy88.89	10
scenario-aware style consistency preference (Chinese)	UniSRM-BENCH T3-Zh	Accuracy91.3	10
scenario-aware style consistency preference (English)	UniSRM-BENCH T3-En	Accuracy85.61	10
fine-grained speech quality scoring	UniSRM-BENCH T2	PCC0.551	9
Speech Quality Evaluation	SOMOS Clean	PCC0.2612	5
Speech Quality Evaluation	SOMOS (Full)	PCC0.2347	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord