Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scientific Reasoning on ScienceWorld Seen

71.6Average Reward

Llama-2-7B-Chat + RFT

5.35222.55139.7556.949Nov 27, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.11
71.6
2025.11
69.7
2025.11
65.6
2025.11
65.1
2025.11
59.4
2025.11
58.6
2025.11
47.3
2025.11
43.6
2025.11
42.9
2025.11
7.9