Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-horizon Mathematical Reasoning on MATH (Result and Process Metrics)

89.2Result Accuracy

Qwen-2-72B-Instruct

43.75255.55167.3579.149Apr 10, 2026Apr 18, 2026Apr 26, 2026May 4, 2026May 12, 2026May 20, 2026May 28, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.05
89.2-
2026.05
82.7-
2026.05
80.5-
2026.05
80-
2026.05
79.4-
2026.05
78.2-
2026.05
77.6-
2026.04
77.4676.89
2026.04
75.3874.62
2026.04
74.4371.97
2026.04
74.2473.11
2026.04
67.6165.72
2026.05
65.4-
2026.04
56.2556.06
2026.04
54.5552.08
2026.04
53.9852.65
2026.04
53.649.62
2026.04
53.652.27
2026.04
52.6550.57
2026.04
52.0850.57
2026.04
49.8147.35
2026.04
47.7343.94
2026.05
45.5-