Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning Performance on Aggregate (MATH-500, AIME24, AIME25, GPQA-diamond)

71.49Accuracy

MUR

29.1140.112551.11562.1175Jul 20, 2025
Updated 22d ago

Evaluation Results

MethodLinks
2025.07
71.490.3910,9181.4
2025.07
71.1-10,767-
2025.07
69.23-1.8711,2594.57
2025.07
69.12.129,393-10.64
2025.07
68.721.7410,5930.76
2025.07
68.711.7211,5489.85
2025.07
67.7-3.412,78418.73
2025.07
66.98-10,512-
2025.07
52.022.7410,421-9.67
2025.07
51.532.2510,942-5.15
2025.07
51.161.8811,8102.37
2025.07
49.28-11,536-
2025.07
43.56-2,212-
2025.07
40.03-1,768-
2025.07
30.74-2,662-