Share your thoughts, 1 month free Claude Pro on usSee more

Interactive Scientific Exploration on ScienceWorld standard 30-task protocol

73.72Average Score (Short)

SDP

Updated 2mo ago

Evaluation Results

Method	Links
SDP 2026.05		73.72	53.5	50.41	59.16
Reflexion 2026.05		71.47	35.43	30.17	45.34
Plan-and-Act 2026.05		60.52	46.43	34.77	47.86
CoT 2026.05		49.54	47.87	23.09	39.23
ReAct 2026.05		48.79	44.01	21.07	36.43
EVOAGENT 2026.05		48.67	36.17	11.38	30.42
SayCan 2026.05		43.83	36.58	23.65	33.82