Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

EmoAra: Emotion-Preserving English Speech Transcription and Cross-Lingual Translation with Arabic Text-to-Speech

About

This work presents EmoAra, an end-to-end emotion-preserving pipeline for cross-lingual spoken communication, motivated by banking customer service where emotional context affects service quality. EmoAra integrates Speech Emotion Recognition, Automatic Speech Recognition, Machine Translation, and Text-to-Speech to process English speech and deliver an Arabic spoken output while retaining emotional nuance. The system uses a CNN-based emotion classifier, Whisper for English transcription, a fine-tuned MarianMT model for English-to-Arabic translation, and MMS-TTS-Ara for Arabic speech synthesis. Experiments report an F1-score of 94% for emotion classification, translation performance of BLEU 56 and BERTScore F1 88.7%, and an average human evaluation score of 81% on banking-domain translations. The implementation and resources are available at the accompanying GitHub repository.

Besher Hassan, Ibrahim Alsarraj, Musaab Hasan, Yousef Melhim, Shahem Fadi, Shahem Sultan• 2026

Related benchmarks

TaskDatasetResultRank
Emotion RecognitionRAVDESS (test)--
17
Showing 1 of 1 rows

Other info

Follow for update