Hermes the Polyglot: A Unified Framework to Enhance Expressiveness for Multimodal Interlingual Subtitling
About
Interlingual subtitling, which translates subtitles of visual media into a target language, is essential for entertainment localization but has not yet been explored in machine translation. Although Large Language Models (LLMs) have significantly advanced the general capabilities of machine translation, the distinctive characteristics of subtitle texts pose persistent challenges in interlingual subtitling, particularly regarding semantic coherence, pronoun and terminology translation, and translation expressiveness. To address these issues, we present Hermes, an LLM-based automated subtitling framework. Hermes integrates three modules: Speaker Diarization, Terminology Identification, and Expressiveness Enhancement, which effectively tackle the above challenges. Experiments demonstrate that Hermes achieves state-of-the-art diarization performance and generates expressive, contextually coherent translations, thereby advancing research in interlingual subtitling.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Subtitle Translation | subtitle dataset en-zh | Translation Score90.6 | 24 | |
| Subtitle Translation | zh-th subtitle dataset | Translation Score91.9 | 12 | |
| Subtitle Translation | Subtitle dataset en-de | Translation Score95.4 | 12 | |
| Subtitle Translation | ko-zh subtitle dataset | Translation Score84.3 | 12 | |
| Subtitle Translation | subtitle dataset en-fr | Translation Score94.1 | 12 | |
| Speaker Diarization | Chinese | DER8.325 | 5 | |
| Speaker Diarization | Chinese Hard | DER10.18 | 5 | |
| Speaker Diarization | English | DER10.272 | 5 |