Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task Completion on ScienceWorld Seen
Loading...
10.2
Average Steps
EAGLET
10.032
11.166
12.3
13.434
Oct 7, 2025
Average Steps
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Steps
EAGLET
Executor Agent=GPT-5
2025.10
10.2
w/o Guidance
Executor Agent=GPT-5
2025.10
11.3
MPO
Executor Agent=GPT-5
2025.10
12.1
EAGLET
Executor Agent=GPT-4.1
2025.10
12.2
MPO
Executor Agent=GPT-4.1
2025.10
13.6
w/o Guidance
Executor Agent=GPT-4.1
2025.10
14.4
Feedback
Search any
task
Search any
task