Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Algorithmic Reasoning on BBEH Mini
Loading...
17.8
Accuracy
Static
13.536
14.643
15.75
16.857
Apr 6, 2026
Accuracy
Tokens
Updated 10d ago
Evaluation Results
Method
Method
Links
Accuracy
Tokens
Static
Budget=4096
2026.04
17.8
12,428
TAB
Budget (B)=10k
2026.04
17
5,236
Static
Budget=2048
2026.04
16.5
6,984
TAB
Budget (B)=8k
2026.04
15.7
4,644
Static
Budget=1024
2026.04
15.4
4,394
TAB
Budget (B)=5k
2026.04
15.4
3,867
Static
Budget=512
2026.04
14.6
2,909
TAB
Budget (B)=3k
2026.04
14.6
2,238
LLM-Judge Individual
Selection Strategy=Ind...
2026.04
13.9
2,152
LLM-Judge Multi-Turn
Selection Strategy=Mul...
2026.04
13.9
1,949
Static
Budget=256
2026.04
13.7
2,054
Feedback
Search any
task
Search any
task