Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text Generation on XSum
Loading...
25.76
F1 Score
Prompt-R1
-0.812
6.0865
12.985
19.8835
Nov 2, 2025
F1 Score
SSim
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
SSim
Prompt-R1
2025.11
25.76
63.02
TextGrad
Category=APO (GPT-4o-m...
2025.11
24.88
28.45
Baseline
Backbone=GPT-4o-mini
2025.11
24.35
60.56
GRPO
Backbone=Qwen3-4B
2025.11
21.98
52.09
SFT
Backbone=Qwen3-4B
2025.11
21.13
55.65
OPRO
Category=APO (GPT-4o-m...
2025.11
17.87
29.47
Baseline
Backbone=Qwen3-4B
2025.11
16.75
31.73
CoT Reasoning
Backbone=GPT-4o-mini
2025.11
8.92
53.94
CoT Reasoning
Backbone=Qwen3-4B
2025.11
8.33
6.81
GEPA
Category=APO (GPT-4o-m...
2025.11
0.21
0.37
Feedback
Search any
task
Search any
task