Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Task Solving on AB-OS
Loading...
79.7
Pass@3
ReACT
52.036
59.218
66.4
73.582
May 8, 2026
Pass@3
Pass@5
Updated 23d ago
Evaluation Results
Method
Method
Links
Pass@3
Pass@5
ReACT
2026.05
79.7
86.2
GSPO
2026.05
79
82.8
A³
variant=σ-Reveal
2026.05
76.2
79.3
A³
variant=Vanilla
2026.05
72.1
72.4
GiGPO
2026.05
67.6
72.4
HGPO
2026.05
62.8
69
RetroAgent
2026.05
62.8
69
LATS
2026.05
59.3
65.5
rStar
2026.05
53.1
58.6
Feedback
Search any
task
Search any
task