Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Autoformalization on Gaokao Formal
Loading...
74.2
Mean Score
SFT+GRPO-0%
19.392
33.621
47.85
62.079
Apr 15, 2026
Mean Score
Mean (neq 0)
Updated 3d ago
Evaluation Results
Method
Method
Links
Mean Score
Mean (neq 0)
SFT+GRPO-0%
sampling_strategy=Best...
2026.04
74.2
80.9
Mathesis-7B
sampling_strategy=Best...
2026.04
73.2
78.5
Kimina-7B
sampling_strategy=Best...
2026.04
73.1
76.8
SFT
sampling_strategy=Best...
2026.04
71.9
80.5
SFT+GRPO-30%
sampling_strategy=Best...
2026.04
71.6
79.1
SFT+GRPO-100%
sampling_strategy=Best...
2026.04
71.6
80.4
GRPO-only
sampling_strategy=Best...
2026.04
42.3
71.2
Base
sampling_strategy=Best...
2026.04
21.5
67.8
Feedback
Search any
task
Search any
task