Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Behavior
Loading...
95
Response Rate (RR)
DeepSeek-v4
-0.68
24.16
49
73.84
May 14, 2026
Response Rate (RR)
Average Utility Rate (AUR)
Updated 16d ago
Evaluation Results
Method
Method
Links
Response Rate (RR)
Average Utility Rate (AUR)
DeepSeek-v4
Query Proximity=Goal-A...
2026.05
95
85
GPT-5.4
Query Proximity=Goal-A...
2026.05
94
42
GPT-5.5
Query Proximity=Goal-A...
2026.05
92
54
Sonnet-4.6
Query Proximity=Goal-A...
2026.05
91
69
Kimi-K2.6
Query Proximity=Goal-A...
2026.05
91
78
Gemini-3.1
Query Proximity=Goal-A...
2026.05
90
83
Kimi-K2.6
Query Proximity=Goal-D...
2026.05
8
5
GPT-5.5
Query Proximity=Goal-D...
2026.05
6
0
Sonnet-4.6
Query Proximity=Goal-D...
2026.05
6
2
GPT-5.4
Query Proximity=Goal-D...
2026.05
5
0
DeepSeek-v4
Query Proximity=Goal-D...
2026.05
5
6
Gemini-3.1
Query Proximity=Goal-D...
2026.05
3
5
Feedback
Search any
task
Search any
task