Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task Completion on ScienceWorld Unseen
Loading...
10.6
Average Steps
EAGLET
10.356
12.003
13.65
15.297
Oct 7, 2025
Average Steps
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Steps
EAGLET
Executor Agent=GPT-5
2025.10
10.6
w/o Guidance
Executor Agent=GPT-5
2025.10
13.1
EAGLET
Executor Agent=GPT-4.1
2025.10
14.3
MPO
Executor Agent=GPT-5
2025.10
15.5
MPO
Executor Agent=GPT-4.1
2025.10
16.5
w/o Guidance
Executor Agent=GPT-4.1
2025.10
16.7
Feedback
Search any
task
Search any
task