Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Interactive Agent Task on ScienceWorld Seen (Average Reward)
Loading...
77.1
Average Reward
BPO
24.6216
38.2458
51.87
65.4942
Aug 5, 2025
Average Reward
Updated 15d ago
Evaluation Results
Method
Method
Links
Average Reward
BPO
Approach=System-2, Eva...
2025.08
77.1
MPO
Approach=System-2, Eva...
2025.08
71.61
ETO
Approach=System-2, Eva...
2025.08
65.69
Deepseek-R1
Approach=System-2, Eva...
2025.08
63.96
SFT
Approach=System-2, Eva...
2025.08
58.82
o3-mini
Approach=System-2, Eva...
2025.08
56.95
Qwen-3-Thinking
Approach=System-2, Eva...
2025.08
52.05
Qwen-2.5-7B-Instruct
Approach=System-1, Eva...
2025.08
26.68
Llama-3.1-8B-Instruct
Approach=System-1, Eva...
2025.08
26.64
Feedback
Search any
task
Search any
task