Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scaling up COMETKIWI: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task

About

We present the joint contribution of Unbabel and Instituto Superior T\'ecnico to the WMT 2023 Shared Task on Quality Estimation (QE). Our team participated on all tasks: sentence- and word-level quality prediction (task 1) and fine-grained error span detection (task 2). For all tasks, we build on the COMETKIWI-22 model (Rei et al., 2022b). Our multilingual approaches are ranked first for all tasks, reaching state-of-the-art performance for quality estimation at word-, span- and sentence-level granularity. Compared to the previous state-of-the-art COMETKIWI-22, we show large improvements in correlation with human judgements (up to 10 Spearman points). Moreover, we surpass the second-best multilingual submission to the shared-task with up to 3.8 absolute points.

Ricardo Rei, Nuno M. Guerreiro, Jos\'e Pombal, Daan van Stigt, Marcos Treviso, Luisa Coheur, Jos\'e G.C. de Souza, Andr\'e F.T. Martins• 2023

Related benchmarks

TaskDatasetResultRank
Machine Translation Meta-evaluationWMT MQM (En-De, En-Es, Ja-Zh) 24
SPA85.4
28
Machine Translation RankingNT20 En→Zh
Accuracy66.49
11
Machine Translation RankingGenMT MQM En→De 22
Accuracy61.2
11
Machine Translation RankingGenMT22 MQM En→Ru
Accuracy67.12
11
Machine Translation RankingNT20 Zh→En
Accuracy57.82
11
Machine Translation RankingGenMT22 (MQM) Zh→En
Accuracy61.6
11
Machine Translation RankingSeed-X-Challenge Zh↔En
Accuracy46.72
11
Machine Translation RankingGemini-annotated held-out Zh↔En (test)
Accuracy72.01
10
Quality EstimationEn-Ml
Pearson r0.454
9
Showing 9 of 9 rows

Other info

Follow for update