Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Simplification on SWiPE
Loading...
46.32
ROUGE-L
APO
33.216
36.618
40.02
43.422
Jun 2, 2025
ROUGE-L
GPT-4o-Judge
Updated 4d ago
Evaluation Results
Method
Method
Links
ROUGE-L
GPT-4o-Judge
APO
backbone=ChatGPT, shots=0
2025.06
46.32
7.51
ChatGPT
shots=0
2025.06
45.09
7.28
LongGuide
backbone=ChatGPT, shots=0
2025.06
45.09
7.28
LongGuide
backbone=Mistral-it (0...
2025.06
41.36
7.24
APO
backbone=Mistral-it (0...
2025.06
39.55
7.11
Mistral-it (0.2)
shots=3
2025.06
39.47
7.12
LongGuide
backbone=Mistral-it (0...
2025.06
38.21
7.32
LongGuide
backbone=ChatGPT, shots=3
2025.06
37.6
5.25
APO
backbone=Mistral-it (0...
2025.06
36.92
7.21
Mistral-it (0.2)
shots=0
2025.06
36.6
7.21
APO
backbone=ChatGPT, shots=3
2025.06
34.46
5.13
ChatGPT
shots=3
2025.06
33.72
5.07
Feedback
Search any
task
Search any
task