Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning evaluation on 109-sample (test)
Loading...
97.25
Accuracy
Universe Routing
87.5884
90.0967
92.605
95.1133
Mar 16, 2026
Accuracy
Latency (ms)
Speedup
P-Value (p†)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Latency (ms)
Speedup
P-Value (p†)
Universe Routing
Params=465M
2026.03
97.25
16
1
-
Qwen3-Next
Params=80B
2026.03
96.33
3,640
228
1
GLM-4.7
Params=696B
2026.03
95.28
12,392
775
0.131
Kimi-K2.5
Params=1T
2026.03
94.5
9,875
617
0.371
Cogito-2.1
Params=671B
2026.03
94.44
1,413
88
0.289
GPT-OSS
Params=120B
2026.03
91.74
1,986
124
0.077
DeepSeek-v3.1
Params=671B
2026.03
87.96
4,090
256
0.01
Feedback
Search any
task
Search any
task