Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation
About
We propose Dynamic Meta-Metrics (DMM), a framework for machine translation evaluation that learns source-sentence conditioned combinations of existing metrics. Rather than relying on a single static ensemble or language-specific weighting, DMM adapts the metric combination based on properties of the source segment. We study hard conditioning, which fits an interpretable combiner per cluster, and an exploratory soft-conditioned extension whose weights vary continuously with source-cluster responsibilities. We evaluate DMM on the WMT Metrics Shared Task data across multiple language pairs using pairwise agreement measures at the system and segment levels. Across settings, MLP-based combinations outperform linear and Gaussian process-based ensembles, and introducing soft conditioning yields gains over linear models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine Translation Meta-evaluation | WMT EN-CS 2025 | Acc*Eq61.4 | 17 | |
| Machine Translation Meta-evaluation | WMT EN-ZH 2025 | Acc*Eq56.8 | 17 | |
| Machine Translation Meta-evaluation | WMT EN-JA 2025 | Acc*Eq57.3 | 17 | |
| Machine Translation Meta-evaluation | WMT EN-UK 2025 | Acc*Eq0.557 | 17 |