Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-hop spatial reasoning on StepGame larger k generalization (test)

28.53Accuracy (k=6)

TP-MANN

10.423615.124319.82524.5257Apr 18, 2022
Updated 1mo ago

Evaluation Results

MethodLinks
2022.04
28.5326.4523.6722.5221.4624.53
2022.04
22.2519.8815.4513.0112.6516.65
2022.04
13.812.6311.5411.311.7712.21
2022.04
12.7312.1111.411.4111.7411.88
2022.04
11.6211.411.8311.2211.6911.56
2022.04
11.1211.5311.2111.1311.3411.27