Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on Average (MATH500, AIME24, AIME25, GPQA_diamond)

58.71Accuracy

InftyThink+

28.768436.541744.31552.0883Jul 20, 2025Aug 22, 2025Sep 25, 2025Oct 28, 2025Dec 1, 2025Jan 3, 2026Feb 6, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.02
58.7115.69265.17--
2026.02
57.1313.63534.57--
2026.02
56.6610.32154.62--
2025.07
54.3614,457-6.44-53.21
2026.02
53.9620.02100.21--
2026.02
53.8310.64320.73--
2026.02
53.6711.35186.3--
2026.02
50.5810.6648.37--
2025.07
48.2312,120-3.46-46.4
2025.07
48.175,318-1.38-57.21
2026.02
47.3114.45149.44--
2025.07
46.362,975-0.840.34
2026.02
44.0614.2477.57--
2025.07
43.51,868---
2025.07
43.023,295-0.33-57.47
2025.07
42.191,967-0.98-2.98
2025.07
41.745,037-1.3-46.21
2026.02
41.6912.1110.96--
2025.07
33.282,625-2.1526.25
2025.07
33.253,304-1.65-70.19
2025.07
31.728,537-1.66-35.48
2025.07
29.925,199-3.64-19.13