Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Empathetic Dialogue on Chit Chat
Loading...
83.3
Score
Gemini2.5-Pro-0605
35.98
48.265
60.55
72.835
Jul 3, 2025
Score
Success Rate
Failure Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Success Rate
Failure Rate
Gemini2.5-Pro-0605
2025.07
83.3
77
11
OpenAI-o3-0416
2025.07
83
66
9
GPT-4o-0326
2025.07
80.9
74
17
GPT-4.1-0414
2025.07
77.1
65
18
Gemini2.5-Flash-Think-0520
2025.07
64.7
53
27
RLVER (PPO)
Think=Yes
2025.07
62.1
52
30
RLVER (PPO)
Think=No
2025.07
53.4
39
37
RLVER (GRPO)
Think=Yes
2025.07
53
45
42
RLVER (GRPO)
Think=No
2025.07
49.2
34
40
Qwen2.5-7B-Instruct
2025.07
37.8
27
58
Feedback
Search any
task
Search any
task