Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task Completion on ALFWorld Seen
Loading...
8.6
Average Steps
EAGLET
8.512
9.106
9.7
10.294
Oct 7, 2025
Average Steps
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Steps
EAGLET
Executor Agent=GPT-5
2025.10
8.6
EAGLET
Executor Agent=GPT-4.1
2025.10
9.4
MPO
Executor Agent=GPT-5
2025.10
9.7
w/o Guidance
Executor Agent=GPT-5
2025.10
10.4
MPO
Executor Agent=GPT-4.1
2025.10
10.6
w/o Guidance
Executor Agent=GPT-4.1
2025.10
10.8
Feedback
Search any
task
Search any
task