Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Graduate-level Reasoning on SuperGPQA (Pass@1)
Loading...
36
Pass@1
Rule+CER
20.4
24.45
28.5
32.55
Mar 11, 2026
Pass@1
Updated 2mo ago
Evaluation Results
Method
Method
Links
Pass@1
Rule+CER
Backbone=Qwen3-8B-Base...
2026.03
36
Rule-based
Backbone=Qwen3-8B-Base...
2026.03
35.1
CER
Backbone=Qwen3-8B-Base...
2026.03
35
General-verifier
Backbone=Qwen3-8B-Base...
2026.03
34.9
Exact-match
Backbone=Qwen3-8B-Base...
2026.03
33.5
VeriFree
Backbone=Qwen3-8B-Base...
2026.03
32.4
Rule-based
Backbone=Qwen3-4B-Base...
2026.03
32.2
CER
Backbone=Qwen3-4B-Base...
2026.03
32.1
Rule+CER
Backbone=Qwen3-4B-Base...
2026.03
31.3
VeriFree
Backbone=Qwen3-4B-Base...
2026.03
30.9
General-verifier
Backbone=Qwen3-4B-Base...
2026.03
30.9
Base
Backbone=Qwen3-8B-Base...
2026.03
27
Exact-match
Backbone=Qwen3-4B-Base...
2026.03
24.2
Base
Backbone=Qwen3-4B-Base...
2026.03
21
Feedback
Search any
task
Search any
task