Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Arithmetic Reasoning on Countdown 0-shot (test)

71.5Pass@1 (Greedy)

SPG w/ EUBO

14.61229.38144.1558.919Oct 10, 2025
Updated 3d ago

Evaluation Results

MethodLinks
2025.10
71.568.271.973.976.6
2025.10
71.167.572.575.176.6
2025.10
54.744.360.66873.1
2025.10
44.936.855.26572.3
2025.10
32.424.540.451.460.6
2025.10
21.118.232.142.550
2025.10
16.815.828.137.745.3