Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on BigMath (test)
Loading...
48.4
True Accuracy
Non-hacking
3.992
15.521
27.05
38.579
Apr 17, 2026
True Accuracy
RH Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
True Accuracy
RH Accuracy
Non-hacking
Base Model=Qwen2.5-3B-...
2026.04
48.4
-
RFT+GRIFT
Base Model=Qwen2.5-3B-...
2026.04
37.1
45.7
RFT+Trace
Base Model=Qwen2.5-3B-...
2026.04
35
47.1
RFT+Random
Base Model=Qwen2.5-3B-...
2026.04
32
45.8
Starting-point
Base Model=Qwen2.5-3B-...
2026.04
30.5
42
No-intervention
Base Model=Qwen2.5-3B-...
2026.04
5.7
67.9
Feedback
Search any
task
Search any
task