Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Summarization on SAMSum (ROUGE-L, GPT-4o-Judge)
Loading...
31.46
ROUGE-L
LongGuide
21.8296
24.3298
26.83
29.3302
Jun 2, 2025
ROUGE-L
GPT-4o-Judge
Updated 4d ago
Evaluation Results
Method
Method
Links
ROUGE-L
GPT-4o-Judge
LongGuide
backbone=ChatGPT, shots=3
2025.06
31.46
7.72
LongGuide
backbone=Mistral-it (0...
2025.06
30.65
7.72
LongGuide
backbone=ChatGPT, shots=0
2025.06
30.47
7.59
LongGuide
backbone=Mistral-it (0...
2025.06
28.35
7.73
Mistral-it (0.2)
shots=3
2025.06
27.13
7.66
APO
backbone=Mistral-it (0...
2025.06
26.23
7.44
APO
backbone=ChatGPT, shots=0
2025.06
25.05
7.45
APO
backbone=ChatGPT, shots=3
2025.06
24.22
7.28
ChatGPT
shots=0
2025.06
23.83
7.43
APO
backbone=Mistral-it (0...
2025.06
23.77
7.31
ChatGPT
shots=3
2025.06
22.21
7.32
Mistral-it (0.2)
shots=0
2025.06
22.2
7.43
Feedback
Search any
task
Search any
task