Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on MATH500 (Avg@4)

97.5Accuracy (Avg@4)

GRPO

80.23684.71889.293.682Dec 1, 2025Dec 17, 2025Jan 3, 2026Jan 20, 2026Feb 5, 2026Feb 22, 2026Mar 11, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
97.5-
2025.12
97.2-
2025.12
96.8-
2026.03
89.32,214
2026.03
87.21,783
2026.03
85.67,138
2026.03
84.61,931
2026.03
83.95,399
2026.03
81.92,520
2026.03
80.91,649