Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Error Span Detection on WMT24 (test)
Loading...
84.8
SPA
Llama-MBR-SOFTF1
68.16
72.48
76.8
81.12
Dec 8, 2025
SPA
Acc*eq
SOFTF1
F1
Updated 4d ago
Evaluation Results
Method
Method
Links
SPA
Acc*eq
SOFTF1
F1
Llama-MBR-SOFTF1
Backbone=Llama-3.3-70B...
2025.12
84.8
57.1
93.2
51.3
xCOMET-Reg
Evaluation Mode=Regres...
2025.12
84.4
58.1
-
-
xCOMET-QE-Reg
Evaluation Mode=Regres...
2025.12
82.5
54.9
-
-
Llama-MAP
Backbone=Llama-3.3-70B...
2025.12
82.3
56.8
91.9
53.1
xCOMET-ESD
Evaluation Mode=Token-...
2025.12
75.7
55.3
88.9
30.2
xCOMET-QE-ESD
Evaluation Mode=Token-...
2025.12
68.8
54.1
87.9
28.9
Feedback
Search any
task
Search any
task