Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scientific Reasoning on GPQA 1.0 (test)
Loading...
53.8
Accuracy
A3PO
32.896
38.323
43.75
49.177
Dec 25, 2025
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
A3PO
Model=Deepseek-R1-Dist...
2025.12
53.8
W-REINFORCE
Model=Deepseek-R1-Dist...
2025.12
51.4
Lp-Reg
Model=Deepseek-R1-Dist...
2025.12
51.2
DAPO w/ Fork Tokens
Model=Deepseek-R1-Dist...
2025.12
50.4
A3PO
Model=Qwen3-8B-Base
2025.12
50.2
DAPO
Model=Deepseek-R1-Dist...
2025.12
50.2
GRPO
Model=Deepseek-R1-Dist...
2025.12
48.4
Lp-Reg
Model=Qwen3-8B-Base
2025.12
47.8
W-REINFORCE
Model=Qwen3-8B-Base
2025.12
47.4
DAPO w/ Fork Tokens
Model=Qwen3-8B-Base
2025.12
47.2
DAPO
Model=Qwen3-8B-Base
2025.12
45.8
GRPO
Model=Qwen3-8B-Base
2025.12
45.3
A3PO
Model=Qwen2.5-7B-Math
2025.12
39.1
Lp-Reg
Model=Qwen2.5-7B-Math
2025.12
36.9
DAPO w/ Fork Tokens
Model=Qwen2.5-7B-Math
2025.12
36.5
W-REINFORCE
Model=Qwen2.5-7B-Math
2025.12
36.2
DAPO
Model=Qwen2.5-7B-Math
2025.12
34.6
GRPO
Model=Qwen2.5-7B-Math
2025.12
33.7
Feedback
Search any
task
Search any
task