Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Summarization on XL-Sum
Loading...
20.93
ROUGE-L
LongGuide
8.7204
11.8902
15.06
18.2298
Jun 2, 2025
ROUGE-L
GPT-4o Judge Score
Updated 4d ago
Evaluation Results
Method
Method
Links
ROUGE-L
GPT-4o Judge Score
LongGuide
backbone=ChatGPT, shots=0
2025.06
20.93
6.36
LongGuide
backbone=ChatGPT, shots=5
2025.06
19.95
6.36
LongGuide
backbone=Mistral-it (0...
2025.06
15.23
5.06
LongGuide
backbone=Mistral-it (0...
2025.06
14.38
6.29
APO
backbone=ChatGPT, shots=5
2025.06
14.07
6.19
APO
backbone=ChatGPT, shots=0
2025.06
12.19
6.07
APO
backbone=Mistral-it (0...
2025.06
12.06
5.85
APO
backbone=Mistral-it (0...
2025.06
11.99
4.55
ChatGPT
shots=5
2025.06
11.42
5.95
ChatGPT
shots=0
2025.06
10.8
5.96
Mistral-it (0.2)
shots=5
2025.06
9.79
4.46
Mistral-it (0.2)
shots=0
2025.06
9.19
5.96
Feedback
Search any
task
Search any
task