Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop spatial reasoning on StepGame with distracting noise (test)
Loading...
85.77
k=1 Accuracy
TP-MANN
20.1148
37.1599
54.205
71.2501
Apr 18, 2022
k=1 Accuracy
k=2 Accuracy
k=3 Accuracy
k=4 Accuracy
k=5 Accuracy
Mean Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
k=1 Accuracy
k=2 Accuracy
k=3 Accuracy
k=4 Accuracy
k=5 Accuracy
Mean Accuracy
TP-MANN
Protocol=Mean±Std over...
2022.04
85.77
60.31
50.18
37.45
31.25
52.99
TPR-RNN
Protocol=Mean±Std over...
2022.04
70.29
46.03
36.14
26.82
24.77
40.81
STM
Protocol=Mean±Std over...
2022.04
53.42
35.96
23.03
18.45
15.14
29.2
UT
Protocol=Mean±Std over...
2022.04
45.11
28.36
17.41
14.07
13.45
23.68
RRN
Protocol=Mean±Std over...
2022.04
24.05
19.98
16.03
13.22
12.31
17.12
RN
Protocol=Mean±Std over...
2022.04
22.64
17.08
15.08
12.84
11.52
15.83
Feedback
Search any
task
Search any
task