Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Reasoning on GPQA Main (Accuracy, Tokens)
Loading...
20.8
Accuracy
Static
14.04
15.795
17.55
19.305
Apr 6, 2026
Accuracy
Tokens Used
Updated 10d ago
Evaluation Results
Method
Method
Links
Accuracy
Tokens Used
Static
Budget=4096
2026.04
20.8
10,994
TAB
Budget (B)=10k
2026.04
19.4
6,415
LLM-Judge Multi-Turn
Selection Strategy=Mul...
2026.04
18.1
5,712
Static
Budget=2048
2026.04
17.4
5,910
LLM-Judge Individual
Selection Strategy=Ind...
2026.04
17.4
5,940
TAB
Budget (B)=8k
2026.04
17.2
4,843
Static
Budget=1024
2026.04
16.7
3,929
TAB
Budget (B)=5k
2026.04
16.5
2,971
TAB
Budget (B)=3k
2026.04
15.6
2,258
Static
Budget=512
2026.04
14.7
2,544
Static
Budget=256
2026.04
14.3
1,822
Feedback
Search any
task
Search any
task