Iterative Translation Refinement with Large Language Models
About
We propose iteratively prompting a large language model to self-correct a translation, with inspiration from their strong language understanding and translation capability as well as a human-like translation approach. Interestingly, multi-turn querying reduces the output's string-based metric scores, but neural metrics suggest comparable or improved quality. Human evaluations indicate better fluency and naturalness compared to initial translations and even human references, all while maintaining quality. Ablation studies underscore the importance of anchoring the refinement to the source and a reasonable seed translation for quality considerations. We also discuss the challenges in evaluation and relation to human performance and translationese.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine Translation | De-En document-level | d-COMET87.73 | 36 | |
| Machine Translation | WMT De-En 22 (test) | COMET86.86 | 29 | |
| Machine Translation | WMT 2023 (test) | COMET87.6 | 24 | |
| Machine Translation | FR-EN | COMET0.8763 | 21 | |
| Machine Translation | Translation Ru-En document-level | d-COMET83.87 | 18 | |
| Machine Translation | En-Ru document-level | d-COMET85.63 | 18 | |
| Machine Translation | document-level translation En-Fr | d-COMET85.06 | 18 | |
| Machine Translation | En-Zh document-level translation | d-COMET82.93 | 18 | |
| Machine Translation | Es-En document-level translation | d-COMET88.23 | 18 | |
| Machine Translation | 10-language machine translation evaluation suite (test) | De->En Score89.33 | 18 |