Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

COMET: A Neural Framework for MT Evaluation

About

We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. To showcase our framework, we train three models with different types of human judgements: Direct Assessments, Human-mediated Translation Edit Rate and Multidimensional Quality Metrics. Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.

Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie• 2020

Related benchmarks

TaskDatasetResultRank
Speech Translation EvaluationMust-C
Pearson Correlation0.9896
94
Speech Translation Metric EvaluationEuroparl-ST (test)
Average Correlation0.9857
84
Machine Translation Meta-evaluationWMT Metrics Shared Task Segment-level 2023 (Primary submissions)
Avg Correlation0.622
33
Machine Translation Meta-evaluationWMT MQM (En-De, En-Es, Ja-Zh) 24
SPA82.4
28
Machine Translation EvaluationWMT MQM Segment-level 22
Score (En-De)59.4
19
Machine Translation EvaluationWMT MQM System-level 22
Overall Score83.9
19
Metric Correlation with Human JudgmentsHearing-to-Translate five language pairs
Correlation (aya_canary)0.4743
15
Quality EstimationPortuguese (pt-BR) dialect sentences (test)
Success Rate54
11
Quality EstimationMandarin (zh-CN) dialect sentences (test)
Success Rate53
11
Machine Translation EvaluationWMT 2019 (test)
de-en0.219
10
Showing 10 of 30 rows

Other info

Follow for update