Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-horizon Mathematical Reasoning on MATH (Result and Process Metrics)

77.46Result Accuracy

StaRPO

46.540854.567962.59570.6221Apr 10, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.04
77.4676.89
2026.04
75.3874.62
2026.04
74.4371.97
2026.04
74.2473.11
2026.04
67.6165.72
2026.04
56.2556.06
2026.04
54.5552.08
2026.04
53.9852.65
2026.04
53.649.62
2026.04
53.652.27
2026.04
52.6550.57
2026.04
52.0850.57
2026.04
49.8147.35
2026.04
47.7343.94