Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop spatial reasoning on StepGame larger k generalization (test)
Loading...
28.53
Accuracy (k=6)
TP-MANN
10.4236
15.1243
19.825
24.5257
Apr 18, 2022
Accuracy (k=6)
Accuracy (k=7)
Accuracy (k=8)
Accuracy (k=9)
Accuracy (k=10)
Mean Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (k=6)
Accuracy (k=7)
Accuracy (k=8)
Accuracy (k=9)
Accuracy (k=10)
Mean Accuracy
TP-MANN
zero-shot generalizati...
2022.04
28.53
26.45
23.67
22.52
21.46
24.53
TPR-RNN
zero-shot generalizati...
2022.04
22.25
19.88
15.45
13.01
12.65
16.65
STM
zero-shot generalizati...
2022.04
13.8
12.63
11.54
11.3
11.77
12.21
UT
zero-shot generalizati...
2022.04
12.73
12.11
11.4
11.41
11.74
11.88
RRN
zero-shot generalizati...
2022.04
11.62
11.4
11.83
11.22
11.69
11.56
RN
zero-shot generalizati...
2022.04
11.12
11.53
11.21
11.13
11.34
11.27
Feedback
Search any
task
Search any
task