FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion

About

This work presents FreeSVC, a promising multilingual singing voice conversion approach that leverages an enhanced VITS model with Speaker-invariant Clustering (SPIN) for better content representation and the State-of-the-Art (SOTA) speaker encoder ECAPA2. FreeSVC incorporates trainable language embeddings to handle multiple languages and employs an advanced speaker encoder to disentangle speaker characteristics from linguistic content. Designed for zero-shot learning, FreeSVC enables cross-lingual singing voice conversion without extensive language-specific training. We demonstrate that a multilingual content extractor is crucial for optimal cross-language conversion. Our source code and models are publicly available.

Alef Iury Siqueira Ferreira, Lucas Rafael Gris, Augusto Seben da Rosa, Frederico Santos de Oliveira, Edresson Casanova, Rafael Teixeira Sousa, Arnaldo Candido Junior, Anderson da Silva Soares, Arlindo Galv\~ao Filho• 2025

Related benchmarks

Task	Dataset	Result	Rank
Singing Voice Conversion	SVC GT Leading (test)	Speaker Similarity0.65		10
Singing Voice Conversion	SVC Mix Vocal (test)	SPK-SIM65.7		5

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord