Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Graduate-level Question Answering on GPQA (Accuracy & Input Length)
Loading...
65.7
Accuracy
RouteGoT
40.428
46.989
53.55
60.111
Mar 6, 2026
Accuracy
Average Input Tokens
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Input Tokens
RouteGoT
Model Pool={Qwen3-4B,...
2026.03
65.7
3,696
AGoT
Backbone=Qwen3-30B, Ca...
2026.03
64.6
35,564
CoT
Backbone=Qwen3-30B, Ca...
2026.03
63.1
247
KNN
Model Pool={Qwen3-4B,...
2026.03
61.1
4,039
Random
Model Pool={Qwen3-4B,...
2026.03
60.1
7,823
GoT*
Backbone=Qwen3-30B, Ca...
2026.03
59.6
16,826
EmbedLLM
Model Pool={Qwen3-4B,...
2026.03
59.6
15,387
RouteLLM
Model Pool={Qwen3-4B,...
2026.03
57.6
4,358
RTR
Model Pool={Qwen3-4B,...
2026.03
56.6
3,998
ToT
Backbone=Qwen3-30B, Ca...
2026.03
44.9
4,928
IO
Backbone=Qwen3-30B, Ca...
2026.03
41.4
250
Feedback
Search any
task
Search any
task