Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Logical Reasoning on Stepgame k=4

93.8Accuracy

LLM-ASP

17.04836.97456.976.826Dec 23, 2024
Updated 24d ago

Evaluation Results

MethodLinks
2024.12
93.8
2024.12
93.3
2024.12
93.2
2024.12
92.9
2024.12
92.6
2024.12
92.6
2024.12
92.6
2024.12
89.8
2024.12
89.7
2024.12
89.6
2024.12
89.5
2024.12
89.4
2024.12
89.3
2024.12
89
2024.12
88.3
2024.12
88.2
2024.12
87.6
2024.12
87
2024.12
84.7
2024.12
83.7
2024.12
82.7
2024.12
76.9
2024.12
76.9
2024.12
75.9
2024.12
75.5
2024.12
70.6
2024.12
70.2
2024.12
68
2024.12
65.8
2024.12
64.1
2024.12
63.5
2024.12
63.5
2024.12
63.2
2024.12
61.8
2024.12
61.6
2024.12
61
2024.12
60.1
2024.12
59.8
2024.12
59.8
2024.12
57.9
2024.12
52.5
2024.12
51.7
2024.12
51.3
2024.12
50.7
2024.12
50.6
2024.12
50.2
2024.12
44.8
2024.12
40.9
2024.12
40.1
2024.12
38.2
2024.12
36.4
2024.12
28
2024.12
26.7
2024.12
22.5
2024.12
20.8
2024.12
20