Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Science Reasoning on GPQA (Avg@4)
Loading...
39.4
Avg@4
DARL
31.288
33.394
35.5
37.606
Jan 21, 2026
Avg@4
Updated 3d ago
Evaluation Results
Method
Method
Links
Avg@4
DARL
Base=Base, Verifier=None
2026.01
39.4
Oat-Zero
Base=Math, Verifier=Rule
2026.01
38.8
SimpleRL-Zoo
Base=Math, Verifier=Rule
2026.01
38.4
RLPR
Base=Base, Verifier=None
2026.01
37.6
General Reasoner
Base=Base, Verifier=Model
2026.01
37.4
DARL
Base=Inst, Verifier=None
2026.01
36.9
VeriFree
Base=Base, Verifier=None
2026.01
36.7
RLPR
Base=Inst, Verifier=None
2026.01
36.5
SimpleRL-Zoo
Base=Base, Verifier=Rule
2026.01
36.2
RLVR
Base=Base, Verifier=Rule
2026.01
36.2
RLVR
Base=Inst, Verifier=Rule
2026.01
36
Qwen2.5-7B-Inst
Base=-, Verifier=None
2026.01
34.2
TTRL
Base=Base, Verifier=Rule
2026.01
34.1
Qwen2.5-7B
Base=-, Verifier=None
2026.01
32.4
PRIME
Base=Math, Verifier=Rule
2026.01
32.1
Llama3.1-8B-Inst
Base=-, Verifier=None
2026.01
31.6
Feedback
Search any
task
Search any
task