Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Embodied Task Completion on ALFWorld
Loading...
94
Success Rate
ReAct
-3.76
21.62
47
72.38
Mar 10, 2026
Success Rate
Path Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Path Score
ReAct
Backbone=DeepSeek-R1
2026.03
94
90.3
AutoAgent
Backbone=DeepSeek-R1
2026.03
91
87.3
AutoAgent
Backbone=QwQ-32B
2026.03
82.8
82
DeepAgent
Backbone=QwQ-32B
2026.03
79.9
87.9
AutoAgent
Backbone=Qwen3-30B-A3B
2026.03
58.2
61.2
DeepAgent
Backbone=Qwen3-30B-A3B
2026.03
48.5
76.6
ReAct
Backbone=QwQ-32B
2026.03
41.8
40.3
DeepAgent
Backbone=DeepSeek-R1
2026.03
40.3
62.4
ReAct
Backbone=Qwen3-30B-A3B
2026.03
0
0
Feedback
Search any
task
Search any
task