Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Outcome correctness on TPS-CalcBench core set v2

51.9Exact Match

gpt-5.2

-2.07611.93725.9539.963Apr 20, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
51.9299.89.387.695
2026.04
48.127.9101485.295.7
2026.04
47.129.313.89.886.2100
43.128.11018.881.787.6
41.924.314.519.385100
38.628.114.518.875.7100
2026.04
37.624.313.624.576.4100
33.628.314.823.372.1100
24.328.118.828.866.2100
9.813.824.352.152.6100
0514.880.224.397.1
04.59.885.718.897.9