Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

B-GRPO: Unsupervised Speech Emotion Recognition based on Batched-Group Relative Policy Optimization

About

Unsupervised speech emotion recognition (SER) focuses on addressing the problem of data sparsity and annotation bias of emotional speech. Reinforcement learning (RL) is a promising method which enhances the performance through rule-based or model-based verification functions rather than human annotations. We treat the sample selection during the learning process as a long-term procedure and whether to select a sample as the action to make policy, thus achieving the application of RL to measure sample quality in SER. We propose a modified Group Relative Policy Optimization (GRPO) to adapt it to classification problems, which takes the samples in a batch as a group and uses the average reward of these samples as the baseline to calculate the advantage. And rather than using a verifiable reward function as in GRPO, we put forward self-reward functions and teacher-reward functions to encourage the model to produce high-confidence outputs. Experiments indicate that the proposed method improves the performance of baseline without RL by 19.8%.

Yingying Gao, Shilei Zhang, Runyan Yang, Zihao Cui, Junlan Feng• 2026

Related benchmarks

TaskDatasetResultRank
Speech Emotion RecognitionMELD--
19
Speech Emotion RecognitionCAFE
Macro F152
5
Speech Emotion RecognitionM3ED
Macro F132.1
5
Speech Emotion RecognitionCASIA
Macro F137
5
Speech Emotion RecognitionIEMOCAP
Macro F1 Score69.2
5
Showing 5 of 5 rows

Other info

Follow for update