Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on GSM8K (Random, Reward, ∆ (pp))

90.4Random Baseline

Sparse Reward

Updated 3mo ago

Evaluation Results

Method	Links
Sparse Reward 2025.10		90.4	93	2.5
Dense Reward 2025.10		89.6	91.4	1.8
Interval Reward 2025.10		87.8	91.2	3.5
Sparse Reward 2025.10		85.8	88.8	3
Sparse Reward 2025.10		80.6	82.1	1.5
Interval Reward 2025.10		78.8	82.5	3.7
Interval Reward 2025.10		67.6	78.2	10.7
Dense Reward 2025.10		64.6	71.1	6.5
Dense Reward 2025.10		38.4	55.7	17.4