Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AMC Out-of-distribution 2023 (test)
Loading...
0.535
Accuracy
RLTR
0.45908
0.47879
0.4985
0.51821
Feb 9, 2026
Accuracy
Maj@4
Maj@16
Maj@64
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Maj@4
Maj@16
Maj@64
RLTR
evaluation_samples=64
2026.02
0.535
0.592
0.667
0.675
RLVR
evaluation_samples=64
2026.02
0.528
0.558
0.625
0.617
Base model
evaluation_samples=64
2026.02
0.462
0.517
0.592
0.608
Feedback
Search any
task
Search any
task