Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
STEM Reasoning on TheoremQA (Avg@2)
Loading...
55.4
Avg@2
RLPR
30.336
36.843
43.35
49.857
Jan 21, 2026
Avg@2
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg@2
RLPR
Base=Base, Verifier=None
2026.01
55.4
DARL
Base=Base, Verifier=None
2026.01
55.2
Oat-Zero
Base=Math, Verifier=Rule
2026.01
53.3
RLVR
Base=Base, Verifier=Rule
2026.01
52.2
General Reasoner
Base=Base, Verifier=Model
2026.01
52.1
SimpleRL-Zoo
Base=Math, Verifier=Rule
2026.01
51.1
SimpleRL-Zoo
Base=Base, Verifier=Rule
2026.01
49.5
TTRL
Base=Base, Verifier=Rule
2026.01
48.8
PRIME
Base=Math, Verifier=Rule
2026.01
47.7
VeriFree
Base=Base, Verifier=None
2026.01
47.6
Qwen2.5-7B-Inst
Base=-, Verifier=None
2026.01
47.3
Qwen2.5-7B
Base=-, Verifier=None
2026.01
41.4
DARL
Base=Inst, Verifier=None
2026.01
39.4
RLPR
Base=Inst, Verifier=None
2026.01
36.5
RLVR
Base=Inst, Verifier=Rule
2026.01
32
Llama3.1-8B-Inst
Base=-, Verifier=None
2026.01
31.3
Feedback
Search any
task
Search any
task