Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Empathetic Dialogue on Sentient Benchmark
Loading...
82.4
Score
Gemini2.5-Pro-0605
10.536
29.193
47.85
66.507
Jul 3, 2025
Score
Success Rate
Failure Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Success Rate
Failure Rate
Gemini2.5-Pro-0605
2025.07
82.4
55
4
GPT-4o-0326
2025.07
79.9
51
4
RLVER (PPO)
Think=Yes
2025.07
79.2
42
9
RLVER (GRPO)
Think=Yes
2025.07
72
34
10
RLVER (GRPO)
Think=No
2025.07
68.3
26
10
GPT-4.1-0414
2025.07
68.2
35
13
Gemini2.5-Flash-Think-0520
2025.07
66.1
39
14
OpenAI-o3-0416
2025.07
62.7
32
14
RLVER (PPO)
Think=No
2025.07
61.7
24
23
Qwen2.5-7B-Instruct
2025.07
13.3
2
76
Feedback
Search any
task
Search any
task