A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction
About
This paper presents a unified spoken language model for emotional intelligence, enhanced by a novel data construction strategy termed Injected Emotional-Attribution Thinking (IEAT). IEAT incorporates user emotional states and their underlying causes into the model's internal reasoning process, enabling emotion-aware reasoning to be internalized rather than treated as explicit supervision. The model is trained with a two-stage progressive strategy. The first stage performs speech-text alignment and emotional attribute modeling via self-distillation, while the second stage conducts end-to-end cross-modal joint optimization to ensure consistency between textual and spoken emotional expressions. Experiments on the Human-like Spoken Dialogue Systems Challenge (HumDial) Emotional Intelligence benchmark demonstrate that the proposed approach achieves top-ranked performance across emotional trajectory modeling, emotional reasoning, and empathetic response generation under both LLM-based and human evaluations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Audio Question Answering | TELEVAL AQA-en (dev) | TELEVAL Score57.69 | 6 | |
| Emotional Reasoning | HumDial Challenge Track 1 Task 2-zh (dev) | LLM Score4.98 | 6 | |
| Emotional Reasoning | HumDial Challenge Track 1 Task 2-en (dev) | LLM Score4.83 | 6 | |
| Emotional Trajectory Detection | HumDial Challenge Track 1 Task 1-zh (dev) | LLM Score (0-5)4.98 | 6 | |
| Emotional Trajectory Detection | HumDial Challenge Track 1 Task 1-en (dev) | LLM Score (0-5)4.87 | 6 | |
| Empathetic Response Generation | HumDial Challenge Track 1 Task 3-en (dev) | LLM Score (0-5)4.36 | 6 | |
| Audio Question Answering | TELEVAL AQA-zh (dev) | TELEVAL Score37.38 | 6 | |
| Empathetic Response Generation | HumDial Challenge Track 1 Task 3-zh (dev) | LLM Score (0-5)4.53 | 6 | |
| Spoken emotional intelligence evaluation | HumDial Challenge Track 1 1.0 (test) | Task 1 Score4.97 | 5 |