Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Action on Agent Action subset
Loading...
98
RR
GPT-5.4
9.6
32.55
55.5
78.45
May 14, 2026
RR
AUR
Updated 16d ago
Evaluation Results
Method
Method
Links
RR
AUR
GPT-5.4
Query Proximity=Goal-A...
2026.05
98
83
GPT-5.5
Query Proximity=Goal-A...
2026.05
96
79
Gemini-3.1
Query Proximity=Goal-A...
2026.05
95
89
Kimi-K2.6
Query Proximity=Goal-A...
2026.05
95
82
DeepSeek-v4
Query Proximity=Goal-A...
2026.05
95
80
Sonnet-4.6
Query Proximity=Goal-A...
2026.05
94
60
Kimi-K2.6
Query Proximity=Goal-D...
2026.05
18
12
DeepSeek-v4
Query Proximity=Goal-D...
2026.05
15
11
Sonnet-4.6
Query Proximity=Goal-D...
2026.05
14
6
GPT-5.4
Query Proximity=Goal-D...
2026.05
13
14
GPT-5.5
Query Proximity=Goal-D...
2026.05
13
13
Gemini-3.1
Query Proximity=Goal-D...
2026.05
13
17
Feedback
Search any
task
Search any
task