Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Translation Preference Prediction on WMT EN -> DE
Loading...
51.6
Pairwise Acc
Distribution-Calibrated Aggregation
43.28
45.44
47.6
49.76
Dec 2, 2025
Pairwise Acc
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Acc
Distribution-Calibrated Aggregation
n=12, Judge LLM=gemini...
2025.12
51.6
Distribution-Calibrated Aggregation
n=4, Judge LLM=gemini-...
2025.12
51
Soft-SC
n=4, Judge LLM=gemini-...
2025.12
49.6
Soft-SC
n=12, Judge LLM=gemini...
2025.12
47.7
CI-SC
n=4, Judge LLM=gemini-...
2025.12
47.3
SC
n=12, Judge LLM=gemini...
2025.12
46.7
CI-SC
n=12, Judge LLM=gemini...
2025.12
46.5
GSC
n=12, Judge LLM=gemini...
2025.12
46.3
GSC
n=4, Judge LLM=gemini-...
2025.12
45.2
USC
n=12, Judge LLM=gemini...
2025.12
44.7
SC
n=4, Judge LLM=gemini-...
2025.12
44.2
USC
n=4, Judge LLM=gemini-...
2025.12
43.6
Feedback
Search any
task
Search any
task