CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models

About

Automatic speech editing aims to modify spoken content based on textual instructions, yet traditional cascade systems rely on explicit temporal alignment and complex preprocessing. To address these limitations, we propose CosyEdit, an end-to-end speech editing model adapted from CosyVoice through task-specific post-training and a complementary training paradigm, which internalizes text--speech alignment while ensuring high consistency between the speech before and after editing. Trained on only 250 hours of supervised data from our curated GigaEdit dataset, our 400M-parameter model achieves reliable speech editing performance. Extensive evaluations show that CosyEdit not only outperforms several billion-parameter language model baselines but also approaches state-of-the-art cascade systems. These results show that robust and efficient speech editing can be unlocked from a zero-shot TTS model through post-training, offering a cost-effective end-to-end solution for high-quality speech editing. Code and audio samples are available at https://cjy1018.github.io/CosyEditDemoPage/.

Junyang Chen, Yuhang Jia, Hui Wang, Jiaming Zhou, Yong Qin• 2026

Related benchmarks

Task	Dataset	Result
Text-to-Speech	Seed-TTS-Eval zh (test)	CER1.76	21
Speech Editing	RealEdit	WER4.5	15
Speech Editing (Deletion)	Ming-Freeform-Audio-Edit English (basic)	DNSMOS3.1	14
Speech Editing (Substitution)	Ming-Freeform-Audio-Edit English (full)	DNSMOS3.13	14
Speech Editing (Deletion)	Ming-Freeform-Audio-Edit English (full)	DNSMOS3.09	14
Speech Editing (Insertion)	Ming-Freeform-Audio-Edit English (basic)	DNSMOS3.1	14
Speech Editing (Insertion)	Ming-Freeform-Audio-Edit English (full)	DNSMOS3.11	14
Speech Editing (Substitution)	Ming-Freeform-Audio-Edit English (basic)	DNSMOS3.11	14
Multilingual Voice Cloning	CV3-Eval Multilingual Voice Cloning (hard-en)	WER13.93	6
Speech Editing	Ming-Freeform-Audio-Edit English Insertion	IMOS4.543	6

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord