Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

COMET: A Neural Framework for MT Evaluation

About

We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. To showcase our framework, we train three models with different types of human judgements: Direct Assessments, Human-mediated Translation Edit Rate and Multidimensional Quality Metrics. Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.

Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie• 2020

Related benchmarks

TaskDatasetResultRank
Machine Translation Meta-evaluationWMT Metrics Shared Task Segment-level 2023 (Primary submissions)
Avg Correlation0.622
33
Machine Translation Meta-evaluationWMT MQM (En-De, En-Es, Ja-Zh) 24
SPA82.4
28
Machine Translation EvaluationWMT MQM Segment-level 22
Score (En-De)59.4
19
Machine Translation EvaluationWMT MQM System-level 22
Overall Score83.9
19
Quality EstimationPortuguese (pt-BR) dialect sentences (test)
Success Rate54
11
Quality EstimationMandarin (zh-CN) dialect sentences (test)
Success Rate53
11
Machine Translation EvaluationWMT 2019 (test)
de-en0.219
10
Machine TranslationEnglish-to-Czech
COMET-DA0.9
8
Machine TranslationEnglish-to-German
COMET-DA60.5
8
Machine TranslationEnglish-to-Russian
COMET-DA65.8
8
Showing 10 of 27 rows

Other info

Follow for update