Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StepGame

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningStepgame k=10
Accuracy88.1
56
Logical ReasoningStepgame k=4
Accuracy93.8
56
Logical ReasoningStepgame k=3
Accuracy89.5
56
Multi-hop spatial reasoningStepGame larger k generalization (test)
Accuracy (k=6)28.53
6
Multi-hop spatial reasoningStepGame with distracting noise (test)
k=1 Accuracy85.77
6
Showing 5 of 5 rows