Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Logical Reasoning on Stepgame k=3

89.5Accuracy

PoT-LLM

18.57236.98655.473.814Dec 23, 2024
Updated 24d ago

Evaluation Results

MethodLinks
2024.12
89.5
2024.12
89.4
2024.12
89.3
2024.12
89.2
2024.12
89.2
2024.12
89.2
2024.12
89.1
2024.12
88.8
2024.12
88.2
2024.12
88.2
2024.12
88
2024.12
86.4
2024.12
85.7
2024.12
85.3
2024.12
84.9
2024.12
83.7
2024.12
82.8
2024.12
82.4
2024.12
81.1
2024.12
81
2024.12
76.4
2024.12
76
2024.12
75.6
2024.12
75.3
2024.12
73.7
2024.12
73.4
2024.12
72.9
2024.12
72.4
2024.12
71.4
2024.12
70.6
2024.12
70
2024.12
69.4
2024.12
69.3
2024.12
69.3
2024.12
69.1
2024.12
68.6
2024.12
68.5
2024.12
67.5
2024.12
67.5
2024.12
67.4
2024.12
61.3
2024.12
60
2024.12
59
2024.12
58.3
2024.12
57.4
2024.12
55.3
2024.12
50.9
2024.12
45.1
2024.12
44.7
2024.12
41.2
2024.12
36.6
2024.12
31.1
2024.12
30.7
2024.12
24.4
2024.12
24
2024.12
21.3