CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task

About

We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated on all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equipping it with a word-level sequence tagger and an explanation extractor. Our results suggest that incorporating references during pretraining improves performance across several language pairs on downstream tasks, and that jointly training with sentence and word-level objectives yields a further boost. Furthermore, combining attention and gradient information proved to be the top strategy for extracting good explanations of sentence-level QE models. Overall, our submissions achieved the best results for all three tasks for almost all language pairs by a considerable margin.

Ricardo Rei, Marcos Treviso, Nuno M. Guerreiro, Chrysoula Zerva, Ana C. Farinha, Christine Maroti, Jos\'e G. C. de Souza, Taisiya Glushkova, Duarte M. Alves, Alon Lavie, Luisa Coheur, Andr\'e F. T. Martins• 2022

Related benchmarks

Task	Dataset	Result
Translation Evaluation	Met-BOUQuET XSTS+R+P (test)	Spearman's rho0.399	38
Machine Translation Meta-evaluation	WMT Metrics Shared Task Segment-level 2023 (Primary submissions)	Avg Correlation0.632	33
Machine Translation Meta-evaluation	MENT ZH-EN	Meta Score42.8	30
Machine Translation Meta-evaluation	MENT EN-ZH	Meta Score42.8	30
Machine Translation Meta-evaluation	WMT MQM (En-De, En-Es, Ja-Zh) 24	SPA73.3	28
Machine Translation	WMT ZH-EN 22	COMET76.9	26
Quality Estimation	WMT EN-DE 22	Pearson R0.722	15
Quality Estimation	WMT 24	Pearson Correlation0.377	12
Machine Translation	WMT JA-EN 22	COMET76.2	12
Machine Translation	WMT EN-ZH 22	COMET82.9	12

Showing 10 of 43 rows

Other info

Follow for update

@wizwand_team Discord