Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chain-of-Thought Reasoning on Average AIME25 AIME24 GPQA-Diamond

74.32Accuracy

Vanilla

34.85245.098555.34565.5915Jun 1, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.06
74.3217,078-
2026.06
73.6912,196-
2026.06
73.316,1195.6
2026.06
72.2816,3544.2
2026.06
71.4510,83811.1
2026.06
70.916,1275.6
2026.06
68.8310,90510.6
2026.06
67.110,94410.3
2026.06
45.0212,169-
2026.06
44.4410,28115.5
2026.06
39.9710,37714.7
2026.06
36.3710,79311.3