Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Task Solving on ShellOps
Loading...
0.462
Pass@3
A³
0.12504
0.21252
0.3
0.38748
May 8, 2026
Pass@3
Pass@5
Updated 23d ago
Evaluation Results
Method
Method
Links
Pass@3
Pass@5
A³
variant=σ-Reveal
2026.05
0.462
0.557
A³
variant=Vanilla
2026.05
0.427
0.514
GiGPO
2026.05
0.223
0.286
GSPO
2026.05
0.219
0.277
HGPO
2026.05
0.219
0.283
LATS
2026.05
0.214
0.277
RetroAgent
2026.05
0.171
0.222
ReACT
2026.05
0.166
0.206
rStar
2026.05
0.138
0.175
Feedback
Search any
task
Search any
task