Emotion-Aware Quantization for Discrete Speech Representations: An Analysis of Emotion Preservation

About

Modern speech systems increasingly use discretized self-supervised speech representations for compression and integration with token-based models, yet their impact on emotional information remains unclear. We study how residual vector quantization (RVQ) reshapes emotional information in discrete speech representations from both representation- and task-level perspectives. Our analysis shows that aggressive compression disproportionately degrades emotion, with uneven loss across emotion classes and model architectures. To address this, we introduce emotion-aware quantization using emotion-specific and emotion-biased codebooks, improving the preservation of both hard and soft emotion perception. We further propose Emo-Q, a lightweight routed quantization method that selects emotion-specialized codebooks, improving emotion recognition performance at lower bitrates. These results highlight the importance of emotion-aware discretization for robust affective speech processing.

Haoguang Zhou, Siyi Wang, Jingyao Wu, James Bailey, Ting Dang• 2026

Related benchmarks

Task	Dataset	Result	Rank
Speech Emotion Recognition	Four OOD (test)	Macro-F1 Delta1.57		21

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord