Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FreezeEmpath: Efficient Training for Empathetic Spoken Chatbots with Frozen LLMs

About

Empathy is essential for fostering natural interactions in spoken dialogue systems, as it enables machines to recognize the emotional tone of human speech and deliver empathetic responses. Recent research has made significant progress in developing empathetic spoken chatbots based on large language models (LLMs). However, several challenges still exist when training such models, including reliance on costly empathetic speech instruction data and a lack of emotional expressiveness in the generated speech. Finetuning LLM with cross-modal empathetic instruction data may also lead to catastrophic forgetting and a degradation of its general capability. To address these challenges, we propose FreezeEmpath, an end-to-end empathetic spoken chatbot trained in a simple and efficient manner. The entire training process relies solely on existing speech instruction data and speech emotion recognition (SER) data, while keeping the LLM's parameters frozen. Experiments demonstrate that FreezeEmpath is able to generate emotionally expressive speech and outperforms other empathetic models in empathetic dialogue, SER, and SpokenQA tasks, demonstrating the effectiveness of our training strategy.

Yun Hong, Yan Zhou, Yang Feng• 2026

Related benchmarks

TaskDatasetResultRank
Speech Emotion RecognitionRAVDESS--
43
Speech-to-Speech Question-AnsweringLlama Questions
Accuracy79.33
27
Speech Emotion RecognitionMELD
Accuracy57.5
24
Speech-to-Speech Question-AnsweringTriviaQA
Accuracy49.71
22
Spoken Question AnsweringTriviaQA
Accuracy46.39
15
Spoken Question AnsweringWeb Questions
Accuracy44.34
12
Empathetic DialogueSpeechAlpaca S2T
Quality Score8.76
5
Empathetic DialogueVStyle-Empathy en
Anger Score4.55
5
Speech Emotion RecognitionCASIA
Accuracy72.4
5
Speech Emotion RecognitionCAFE
Accuracy79.3
5
Showing 10 of 15 rows

Other info

Follow for update