Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Task Solving on AB-DB
Loading...
51.1
Pass@3
A³
35.188
39.319
43.45
47.581
May 8, 2026
Pass@3
Pass@5
Updated 23d ago
Evaluation Results
Method
Method
Links
Pass@3
Pass@5
A³
variant=Vanilla
2026.05
51.1
55.3
A³
variant=σ-Reveal
2026.05
50.3
55
GSPO
2026.05
42.8
45.8
RetroAgent
2026.05
41.8
44.7
ReACT
2026.05
38.2
41.2
HGPO
2026.05
37.8
41.2
LATS
2026.05
37.7
41.2
rStar
2026.05
36.1
41.2
GiGPO
2026.05
35.8
38.9
Feedback
Search any
task
Search any
task