Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scientific Reasoning in Text-based Environments on ScienceWorld (test)

44.8Task 1-1 Score

Expert

-1.79210.30422.434.496Feb 20, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.02
44.830.78.75.873.87210096.728.51722.913.78569.9836.641.566.516.917-
2024.02
11.827.226.32546.714.380.78.450.118.18.65448.812.89.224.11833.532.410
2024.02
0.7227.929.53240.710.779.88.342.817.47.86250.612.69.121.613.829.925.3-
2024.02
0.209.637.722.876.326.672.55.523.726.410.495.58172.321.712.838.92.116
2024.02
0.109.277.619.675.528.4856.349.427.311.57275.26.46.238.518.939.410.29
2024.02
00.111.333.315.464.815.164.13.736.2187.25046.44016.51021.70.2-
2024.02
0022.273.827.485.116.481.4652.529.310.482.561.46.22.431.714.444.64.3-