Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on GSM8K

1Accuracy

GPT-5.2

0.3384560.5102030.681950.853697Sep 22, 2025Oct 21, 2025Nov 20, 2025Dec 20, 2025Jan 19, 2026Feb 18, 2026Mar 20, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2026.02
1--
2026.02
1--
2026.02
1--
2026.02
0.99--
2026.02
0.961--
2026.02
0.954--
2026.02
0.943--
2026.02
0.9333--
2026.02
0.9242--
2026.02
0.9234--
2026.02
0.9181--
2026.03
0.872--
2026.03
0.87--
2026.03
0.866--
2026.01
0.8575--
2025.11
0.8541--
2026.01
0.8471--
2026.01
0.837--
2026.01
0.8362--
2026.03
0.836--
2026.03
0.834--
2026.03
0.834--
2026.03
0.832--
2026.01
0.8317--
2026.03
0.83--
2026.03
0.826--
2026.03
0.822--
2026.03
0.818--
2025.11
0.817916.71-
2025.11
0.8173--
2025.11
0.814625.72-
2026.02
0.8118--
2026.02
0.81--
2026.03
0.808--
2026.03
0.806--
2026.03
0.804--
2026.03
0.802--
2025.11
0.8007--
2026.03
0.8006--
2026.01
0.7991--
2026.01
0.7983--
2026.02
0.7885--
2025.11
0.779216.19-
2026.01
0.7703--
2026.01
0.7665--
2026.03
0.758--
2026.02
0.757--
2026.03
0.756--
2026.03
0.756--
2026.03
0.754--
2026.01
0.7473--
2025.11
0.7386--
2026.03
0.734--
2026.01
0.7309--
2026.01
0.7301--
2026.01
0.7271--
2025.11
0.7267--
2025.11
0.721710.42-
2026.01
0.7172--
2026.01
0.7149--
2026.02
0.711--
2026.03
0.706--
2026.03
0.704--
2025.11
0.703111.81-
2025.09
0.7004--
2026.02
0.697--
2026.02
0.694--
2025.09
0.6909--
2025.09
0.689--
2025.09
0.6883--
2025.09
0.6872--
2025.09
0.6853--
2025.09
0.6834--
2026.01
0.6755--
2025.09
0.672--
2025.09
0.6644--
2026.02
0.663--
2025.11
0.6508--
2025.09
0.6464--
2026.01
0.6414--
2026.01
0.6361--
2026.01
0.6325--
2026.01
0.6308--
2026.01
0.6186--
2025.11
0.6175--
2025.11
0.6173--
2026.01
0.6118--
0.604--
2026.02
0.6--
2026.01
0.5959--
2026.01
0.5921--
2025.11
0.585--
2025.11
0.5573--
2025.09
0.4511--
2025.09
0.4106--
2025.09
0.4--
2025.09
0.3981--
2025.09
0.396--
2025.09
0.38--
2025.09
0.3639--
Showing 100 of 155 rows