Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Task-Focused Dialogue on TmallBrand-A (TSE, Reward, BLEU)
Loading...
85.12
TSE Score
DeepSeek
67.2216
71.8683
76.515
81.1617
Jan 24, 2026
TSE Score
Reward Score
BLEU Score
Updated 4d ago
Evaluation Results
Method
Method
Links
TSE Score
Reward Score
BLEU Score
DeepSeek
Model Version=R1
2026.01
85.12
7.44
14.1
GOPO
Backbone=Qwen3-14B
2026.01
85.11
7.46
18.6
GLM
Model Version=4.7
2026.01
84.87
7.09
15.3
GPT
Model Version=5.2
2026.01
84.26
6.72
9.5
GOPO
Backbone=Qwen-7B-Chat
2026.01
83.93
6.88
19.2
Qwen
Model Version=235B
2026.01
83.27
6.38
11.4
Gemini
Model Version=2.5
2026.01
80.84
6.91
13.1
PPO
2026.01
80.4
6.39
18.1
Memento
2026.01
77.93
6.44
16.8
SFT
2026.01
76.89
5.59
16.5
Untrained
2026.01
67.91
5.25
8.3
Feedback
Search any
task
Search any
task