Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

About

Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.

Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim• 2024

Related benchmarks

TaskDatasetResultRank
Multi-turn Dialogue EvaluationMT-Bench--
331
Instruction FollowingAlpacaEval 2.0
LC Win Rate21.34
281
Bias EvaluationBBQ
Accuracy95.2
99
LLM Alignment EvaluationAlpacaEval 2
LC Win Rate38.1
72
Machine TranslationFlores-200 Romance group en->xx (test)
BLEU33.53
46
Machine TranslationFlores-200 Romance group xx->en (test)
BLEU36.44
46
Language Model Alignment EvaluationArena Hard v0.1
Win Rate (%)30
36
General Mathematics ReasoningMath-G College-math, Math-OAI, Minerva-math (test)
Accuracy53.7
24
Competition Mathematics ReasoningMath-C AIME24, AMC23, Olympiadbench (test)
Accuracy33.5
24
General ReasoningGeneral-R MMLU-stem, ARC-challenge (test)
Accuracy61.4
24
Showing 10 of 28 rows

Other info

Follow for update