Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Embodied Task Planning on ALFWorld textual observation (seen)
Loading...
92.5
Success Rate
DynaMind
67.644
74.097
80.55
87.003
Apr 9, 2026
Success Rate
Updated 9d ago
Evaluation Results
Method
Method
Links
Success Rate
DynaMind
Base Model=Qwen2.5-7B
2026.04
92.5
RoboAgent
Base Model=Qwen2.5-VL-3B
2026.04
92.1
GiGPO
Base Model=Qwen2.5-7B
2026.04
90.8
BPO
Base Model=Llama3.1-8B
2026.04
87.9
SEEA-R1
Base Model=Qwen2.5-7B
2026.04
85.3
MPO
Base Model=Llama3.1-8B
2026.04
85
Zero-Shot
Base Model=GPT-4o
2026.04
78.6
IPR
Base Model=Llama-2-7B
2026.04
70.3
ETO
Base Model=Llama2-7B
2026.04
68.6
Feedback
Search any
task
Search any
task