Share your thoughts, 1 month free Claude Pro on usSee more

Scientific Reasoning on ScienceWorld Seen

71.6Average Reward

Llama-2-7B-Chat + RFT

Updated 4mo ago

Evaluation Results

Method	Links
Llama-2-7B-Chat + RFT 2025.11		71.6
Co-Evolving Agents 2025.11		69.7
Llama-2-7B-Chat + ETO 2025.11		65.6
Co-Evolving Agents 2025.11		65.1
Llama-2-7B-Chat + PPO 2025.11		59.4
ETO 2025.11		58.6
Llama-2-7B-Chat + SFT 2025.11		47.3
SFT 2025.11		43.6
GPT-4 2025.11		42.9
GPT-3.5-Turbo 2025.11		7.9