Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Machine Translation on IWSLT en-ja 17
Loading...
41.22
ROUGE-L
LongGuide
11.5488
19.2519
26.955
34.6581
Jun 2, 2025
ROUGE-L
GPT-4o Judge Score
Updated 4d ago
Evaluation Results
Method
Method
Links
ROUGE-L
GPT-4o Judge Score
LongGuide
backbone=ChatGPT, shots=0
2025.06
41.22
8.11
LongGuide
backbone=ChatGPT, shots=3
2025.06
38.43
7.91
APO
backbone=ChatGPT, shots=0
2025.06
37.74
7.44
ChatGPT
shots=0
2025.06
36.13
7.62
APO
backbone=ChatGPT, shots=3
2025.06
33.72
7.31
ChatGPT
shots=3
2025.06
31.93
7.25
LongGuide
backbone=Mistral-it (0...
2025.06
16.62
3.4
LongGuide
backbone=Mistral-it (0...
2025.06
16.53
3.45
APO
backbone=Mistral-it (0...
2025.06
14.45
2.91
APO
backbone=Mistral-it (0...
2025.06
14.08
2.92
Mistral-it (0.2)
shots=0
2025.06
13.12
2.82
Mistral-it (0.2)
shots=3
2025.06
12.69
2.66
Feedback
Search any
task
Search any
task