Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance

About

Current movie dubbing technology can generate the desired voice from a given speech prompt, ensuring good synchronization between speech and visuals while accurately conveying the intended emotions. However, in movie dubbing, key aspects such as adapting to different dubbing styles, handling dialogue, narration, and monologue effectively, and understanding subtle details like the age and gender of speakers, have not been well studied. To address this challenge, we propose a framework of multi-modal large language model. First, it utilizes multimodal Chain-of-Thought (CoT) reasoning methods on visual inputs to understand dubbing styles and fine-grained attributes. Second, it generates high-quality dubbing through large speech generation models, guided by multimodal conditions. Additionally, we have developed a movie dubbing dataset with CoT annotations. The evaluation results demonstrate a performance improvement over state-of-the-art methods across multiple datasets. In particular, for the evaluation metrics, the SPK-SIM and EMO-SIM increases from 82.48% to 89.74%, 66.24% to 78.88% for dubbing setting 2.0 on V2C Animation dataset, LSE-D and MCD-SL decreases from 14.79 to 14.63, 5.24 to 4.74 for dubbing setting 2.0 on Grid dataset, SPK-SIM increases from 64.03 to 83.42 and WER decreases from 52.69% to 23.20% for initial reasoning setting on proposed CoT-Movie-Dubbing dataset in the comparison with the state-of-the art models.

Junjie Zheng, Zihao Chen, Chaofan Ding, Xinhan Di• 2025

Related benchmarks

TaskDatasetResultRank
DubbingV2C-Animation + Chem + GRID (test)
MCD (DTW)7.46
8
DubbingCineDub-CN Corrected (test)
MCD-DTW5.25
7
Movie DubbingV2C2GRID
DD0.3995
6
Movie DubbingV2C2Chem
DD0.5041
6
Movie DubbingGRID2Chem zero-shot
DD (Sync Error)0.5041
6
Movie DubbingChem2V2C zero-shot
DD (Synchronization)0.5756
6
DubbingV2C-Animation
DD0.5756
6
DubbingChem
DD (Delay)0.5041
6
DubbingGRID
DD0.3995
6
Movie DubbingGRID2V2C
DD (Sync Error)0.5756
6
Showing 10 of 11 rows

Other info

Follow for update