Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Summarization on CNN 3.0.0
Loading...
22.46
ROUGE-L
LongGuide
14.192
16.3385
18.485
20.6315
Jun 2, 2025
ROUGE-L
GPT-4o Judge Score
Updated 4d ago
Evaluation Results
Method
Method
Links
ROUGE-L
GPT-4o Judge Score
LongGuide
backbone=Mistral-it (0...
2025.06
22.46
7.45
LongGuide
backbone=ChatGPT, shots=0
2025.06
22.19
7.67
APO
backbone=ChatGPT, shots=0
2025.06
20.34
7.39
ChatGPT
shots=0
2025.06
20.12
7.44
APO
backbone=Mistral-it (0...
2025.06
19.53
7.4
Mistral-it (0.2)
shots=0
2025.06
19.23
7.38
LongGuide
backbone=Mistral-it (0...
2025.06
19.19
5.99
APO
backbone=Mistral-it (0...
2025.06
18.18
5.89
LongGuide
backbone=ChatGPT, shots=3
2025.06
18.17
4.42
Mistral-it (0.2)
shots=3
2025.06
17.56
5.84
APO
backbone=ChatGPT, shots=3
2025.06
15.2
4.01
ChatGPT
shots=3
2025.06
14.51
4.38
Feedback
Search any
task
Search any
task