Share your thoughts, 1 month free Claude Pro on usSee more

Math Reasoning on Math Reasoning 1.5B model (val)

69.4Validation Accuracy

Execution-Guided Search

Updated 5mo ago

Evaluation Results

Method	Links
Execution-Guided Search 2026.01		69.4
Best Human Expert 2026.01		68.8
Baseline 2026.01		48