Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME Out-of-distribution 2024 (test)
Loading...
14.8
Accuracy
RLTR
9.6
10.95
12.3
13.65
Feb 9, 2026
Accuracy
Majority Vote @4
Majority Vote @16
Majority Vote @64
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Majority Vote @4
Majority Vote @16
Majority Vote @64
RLTR
evaluation_samples=64
2026.02
14.8
18.9
20
21.1
RLVR
evaluation_samples=64
2026.02
11.6
14.4
17.8
18.9
Base model
evaluation_samples=64
2026.02
9.8
10
13.3
16.7
Feedback
Search any
task
Search any
task