Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Task-Focused Dialogue on Mgshop (TSE, Reward, BLEU)
Loading...
94.75
TSE
GOPO
73.7316
79.1883
84.645
90.1017
Jan 24, 2026
TSE
Reward
BLEU
Updated 4d ago
Evaluation Results
Method
Method
Links
TSE
Reward
BLEU
GOPO
Backbone=Qwen3-14B
2026.01
94.75
7.63
27.9
DeepSeek
Model Version=R1
2026.01
93.81
7.46
14.3
GLM
Model Version=4.7
2026.01
93.65
7.25
15
GPT
Model Version=5.2
2026.01
93.38
7.54
9.7
Gemini
Model Version=2.5
2026.01
92.87
7.35
13.3
GOPO
Backbone=Qwen-7B-Chat
2026.01
92.43
7.38
21.1
Qwen
Model Version=235B
2026.01
92.27
7.24
18.2
PPO
2026.01
85.84
7.09
19
Memento
2026.01
83.81
7.13
18.8
SFT
2026.01
83.63
6.25
18.7
Untrained
2026.01
74.54
5.97
9.1
Feedback
Search any
task
Search any
task