Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Code-Switched Text Synthesis in Unseen Language Pairs

About

Existing efforts on text synthesis for code-switching mostly require training on code-switched texts in the target language pairs, limiting the deployment of the models to cases lacking code-switched data. In this work, we study the problem of synthesizing code-switched texts for language pairs absent from the training data. We introduce GLOSS, a model built on top of a pre-trained multilingual machine translation model (PMMTM) with an additional code-switching module. This module, either an adapter or extra prefixes, learns code-switching patterns from code-switched data during training, while the primary component of GLOSS, i.e., the PMMTM, is frozen. The design of only adjusting the code-switching module prevents our model from overfitting to the constrained training data for code-switching. Hence, GLOSS exhibits the ability to generalize and synthesize code-switched texts across a broader spectrum of language pairs. Additionally, we develop a self-training algorithm on target language pairs further to enhance the reliability of GLOSS. Automatic evaluations on four language pairs show that GLOSS achieves at least 55% relative BLEU and METEOR scores improvements compared to strong baselines. Human evaluations on two language pairs further validate the success of GLOSS.

I-Hung Hsu, Avik Ray, Shubham Garg, Nanyun Peng, Jing Huang• 2023

Related benchmarks

TaskDatasetResultRank
Code-switched text synthesisEs-En
BLEU24.85
11
Code-switched text synthesisBn-En
BLEU9.65
11
Code-switched text synthesisDE-EN
BLEU21.88
11
Code-switched text synthesisHi-En
BLEU12.16
11
Code-switched text synthesisHindi-English (Hi-En) (test)
CS Rate93.3
6
Code-switched text synthesisSEAME Chinese-English (test)
CS Rate0.993
5
Showing 6 of 6 rows

Other info

Code

Follow for update