Share your thoughts, 1 month free Claude Pro on usSee more

Scientific Reasoning on ScienceWorld Unseen

62Average Reward

Co-Evolving Agents

Updated 1mo ago

Evaluation Results

Method	Links
Co-Evolving Agents 2025.11		62
Co-Evolving Agents 2025.11		58.5
Llama-2-7B-Chat + ETO 2025.11		55.5
ETO 2025.11		55.2
Llama-2-7B-Chat + RFT 2025.11		54.3
Llama-2-7B-Chat + PPO 2025.11		51.7
Llama-2-7B-Chat + SFT 2025.11		41.9
SFT 2025.11		40.8
GPT-4 2025.11		38.1
GPT-3.5-Turbo 2025.11		10.5
DELTAMEM 2026.06		0.8688
Synapse 2026.06		0.8558
AWM 2026.06		0.7453
No Memory 2026.06		0.7186
RBank 2026.06		0.6898