Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-hop spatial reasoning on StepGame with distracting noise (test)

85.77k=1 Accuracy

TP-MANN

20.114837.159954.20571.2501Apr 18, 2022
Updated 1mo ago

Evaluation Results

MethodLinks
2022.04
85.7760.3150.1837.4531.2552.99
2022.04
70.2946.0336.1426.8224.7740.81
2022.04
53.4235.9623.0318.4515.1429.2
2022.04
45.1128.3617.4114.0713.4523.68
2022.04
24.0519.9816.0313.2212.3117.12
2022.04
22.6417.0815.0812.8411.5215.83