Conditioning LLMs to Generate Code-Switched Text

About

Code-switching (CS) is still a critical challenge in Natural Language Processing (NLP), due to the limited availability of large-scale, diverse CS datasets for robust training and evaluation. Despite recent advances, the capabilities and limitations of LLMs in handling CS are still not fully understood. In this work, we investigate the extent to which LLMs can be used in a framework for CS text generation, focusing on the English-Spanish language pair. Our proposed methodology consists of back-translating natural CS sentences into monolingual English, and using the resulting parallel corpus to fine-tune LLMs to turn monolingual sentences into CS. We thoroughly analyse the models' performance through a study on human preferences, a qualitative error analysis, an evaluation with popular reference-based metrics and LLM-based judgment. Results show that fine-tuning can be a key step to ensure that current LLMs consistently generate fluent code-switched text and that our methodology generates high-quality outputs, expanding research opportunities in CS communication. We find that traditional metrics do not correlate with human judgement when assessing the quality of the generated CS data, but LLM-based judgment aligns more closely with human preferences. We release our code and generated dataset under a CC-BY-NC-SA license.

Maite Heredia, Gorka Labaka, Jeremy Barnes, Aitor Soroa• 2025

Related benchmarks

Task	Dataset	Result
Code-switched Text Generation	English-to-Code-Switched human preference evaluation (In domain)	Preference Score573.5	12
Code-switched Text Generation	EN-CS In domain (test)	BLEU35.56	6
Code-switched Text Generation	EN-CS Out of domain (test)	BLEU15.65	6
Code-switched Text Generation	English-to-Code-Switched human preference evaluation (Out of domain)	Score282	6
Code-Switching Generation	Code-Switching (CS) Generation Evaluation Dataset	Score603	6

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord