Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Science Reasoning on GPQA (Pass@1 and Pass@16)
Loading...
52
Pass@1
Base Model
31.304
36.677
42.05
47.423
Apr 20, 2026
Pass@1
Pass@16
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@16
Base Model
Model Scale=Qwen3-4B,...
2026.04
52
84.4
MIXED-CUTS
Model Scale=Qwen3-4B,...
2026.04
50.1
84.5
GRPO
Model Scale=Qwen3-4B,...
2026.04
48.1
84.6
Base Model
Model Scale=Qwen3-4B,...
2026.04
45.3
83.9
MIXED-CUTS
Model Scale=Qwen3-1.7B...
2026.04
36
79.4
Base Model
Model Scale=Qwen3-1.7B...
2026.04
34.9
75.3
GRPO
Model Scale=Qwen3-1.7B...
2026.04
34.2
80.1
Base Model
Model Scale=Qwen3-1.7B...
2026.04
32.1
80.3
Feedback
Search any
task
Search any
task