Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Dialogue Generation on Syn. Persona
Loading...
22.98
ROUGE-L
LongGuide
2.7832
8.0266
13.27
18.5134
Jun 2, 2025
ROUGE-L
GPT-4o Score
Updated 4d ago
Evaluation Results
Method
Method
Links
ROUGE-L
GPT-4o Score
LongGuide
backbone=ChatGPT, shots=0
2025.06
22.98
6.41
LongGuide
backbone=ChatGPT, shots=5
2025.06
22.36
5.26
APO
backbone=ChatGPT, shots=0
2025.06
19.91
6.12
ChatGPT
shots=0
2025.06
19.46
6.04
APO
backbone=ChatGPT, shots=5
2025.06
17.68
4.55
ChatGPT
shots=5
2025.06
16.1
4.67
LongGuide
backbone=Mistral-it (0...
2025.06
14.69
4.45
Mistral-it (0.2)
shots=0
2025.06
12.76
2.68
APO
backbone=Mistral-it (0...
2025.06
10.66
2.41
LongGuide
backbone=Mistral-it (0...
2025.06
5.25
3.93
APO
backbone=Mistral-it (0...
2025.06
4.26
1.05
Mistral-it (0.2)
shots=5
2025.06
3.56
1.09
Feedback
Search any
task
Search any
task