Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multiple-choice Reasoning on GPQA (test)
Loading...
65.7
Accuracy
RouteGoT
40.428
46.989
53.55
60.111
Mar 6, 2026
Accuracy
Average Output Tokens
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Output Tokens
RouteGoT
Backbone={Qwen3-4B, 8B...
2026.03
65.7
3,352
AGoT
Backbone=Qwen3-30B
2026.03
64.6
12,179
CoT
Backbone=Qwen3-30B
2026.03
63.1
4,965
KNN
Backbone={Qwen3-4B, 8B...
2026.03
61.1
3,292
Random
Backbone={Qwen3-4B, 8B...
2026.03
60.1
5,658
GoT*
Backbone=Qwen3-30B
2026.03
59.6
9,468
EmbedLLM
Backbone={Qwen3-4B, 8B...
2026.03
59.6
11,369
RouteLLM
Backbone={Qwen3-4B, 8B...
2026.03
57.6
3,640
RTR
Backbone={Qwen3-4B, 8B...
2026.03
56.6
3,224
ToT
Backbone=Qwen3-30B
2026.03
44.9
9,077
IO
Backbone=Qwen3-30B
2026.03
41.4
7
Feedback
Search any
task
Search any
task