Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Persuasive Dialogue on ToM-BPD interactive evaluation
Loading...
55.23
Win Rate: Identification
Qwen3-8B + TTBYS vs. GPT-5
34.0556
39.5528
45.05
50.5472
May 21, 2026
Win Rate: Identification
Lose Rate: Identification
Win Rate: Empathy
Lose Rate: Empathy
Win Rate: Persuasion
Lose Rate: Persuasion
Win Rate: Fluency
Lose Rate: Fluency
Win Rate: Consistency
Lose Rate: Consistency
Updated 12d ago
Evaluation Results
Method
Method
Links
Win Rate: Identification
Lose Rate: Identification
Win Rate: Empathy
Lose Rate: Empathy
Win Rate: Persuasion
Lose Rate: Persuasion
Win Rate: Fluency
Lose Rate: Fluency
Win Rate: Consistency
Lose Rate: Consistency
Qwen3-8B + TTBYS vs. GPT-5
Base Model=Qwen3-8B, S...
2026.05
55.23
26.47
34.56
25.12
42.78
26.31
35.44
29.18
28.62
31.04
Qwen3-8B + TTBYS vs. GPT-5 + CoT
Base Model=Qwen3-8B, S...
2026.05
45.19
29.44
37.33
25.78
42.51
36.29
33.22
27.65
30.11
28.47
GPT-5 vs. GPT-5 + CoT
Comparison=Head-to-hea...
2026.05
34.87
50.12
30.45
28.33
32.21
38.76
32.58
28.14
29.04
30.22
Feedback
Search any
task
Search any
task