CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations
About
Prior work establishes that controlled contrastiveness between self-generated responses from large language models, set via reward scores, improves downstream preference tuning in English. We extend this method to multiple languages and evaluate two models across a total of 14 high and low-resource languages on a diverse set of tasks. Our central finding is that cross-lingual contrastive preference tuning on self-generations (CroCo) transfers without language-specific preference annotation. A reward model trained on English preferences (atop a multilingual base) produces useful within-language rankings across most languages, and pairing in either a monolingual or multilingual setting improves over each model on the majority of setups while preventing the catastrophic forgetting of supervised fine-tuning. We observe that the gains require on-policy data. Off-policy responses reduce the benefit and online preference optimization fails to improve over the offline variant. Specifically, on structured tasks, our method matches or exceeds the base in 6/7 languages for EuroLLM-9B and 4/7 settings for Aya-3B. On open-ended generation, both tuned models win against their respective base across 11 evaluated languages. Overall, we show promising directions for multilingual preference tuning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Understanding | EuroEval German | EuroEval Score0.1 | 18 | |
| Natural Language Understanding | EuroEval English | EuroEval Score7 | 18 | |
| Natural Language Understanding | EuroEval Spanish | EuroEval Score4.5 | 18 | |
| Natural Language Understanding | EuroEval French | EuroEval Score7.2 | 18 | |
| Natural Language Understanding | EuroEval Danish | EuroEval Score7.4 | 14 | |
| Natural Language Understanding | EuroEval Dutch | EuroEval Dutch Score7.3 | 14 | |
| Natural Language Understanding | EuroEval Italian | EuroEval Score10.6 | 14 | |
| Cross-lingual NLP evaluation | EuroEval Norwegian (held-out) | EuroEval Score5.7 | 4 | |
| Cross-lingual NLP evaluation | EuroEval Portuguese (held-out) | EuroEval Portuguese Score8.2 | 4 | |
| Cross-lingual NLP evaluation | EuroEval Swedish (held-out) | EuroEval Score5.2 | 4 |