Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

About

Prior work establishes that controlled contrastiveness between self-generated responses from large language models, set via reward scores, improves downstream preference tuning in English. We extend this method to multiple languages and evaluate two models across a total of 14 high and low-resource languages on a diverse set of tasks. Our central finding is that cross-lingual contrastive preference tuning on self-generations (CroCo) transfers without language-specific preference annotation. A reward model trained on English preferences (atop a multilingual base) produces useful within-language rankings across most languages, and pairing in either a monolingual or multilingual setting improves over each model on the majority of setups while preventing the catastrophic forgetting of supervised fine-tuning. We observe that the gains require on-policy data. Off-policy responses reduce the benefit and online preference optimization fails to improve over the offline variant. Specifically, on structured tasks, our method matches or exceeds the base in 6/7 languages for EuroLLM-9B and 4/7 settings for Aya-3B. On open-ended generation, both tuned models win against their respective base across 11 evaluated languages. Overall, we show promising directions for multilingual preference tuning.

Mike Zhang, Ali Basirat, Desmond Elliott• 2026

Related benchmarks

TaskDatasetResultRank
Natural Language UnderstandingEuroEval German
EuroEval Score0.1
18
Natural Language UnderstandingEuroEval English
EuroEval Score7
18
Natural Language UnderstandingEuroEval Spanish
EuroEval Score4.5
18
Natural Language UnderstandingEuroEval French
EuroEval Score7.2
18
Natural Language UnderstandingEuroEval Danish
EuroEval Score7.4
14
Natural Language UnderstandingEuroEval Dutch
EuroEval Dutch Score7.3
14
Natural Language UnderstandingEuroEval Italian
EuroEval Score10.6
14
Cross-lingual NLP evaluationEuroEval Norwegian (held-out)
EuroEval Score5.7
4
Cross-lingual NLP evaluationEuroEval Portuguese (held-out)
EuroEval Portuguese Score8.2
4
Cross-lingual NLP evaluationEuroEval Swedish (held-out)
EuroEval Score5.2
4
Showing 10 of 10 rows

Other info

GitHub

Follow for update