Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AlfWorld

Benchmarks

Task NameDataset NameSOTA ResultTrend
Interactive Decision-makingALFWorld
Overall Success Rate99.6
295
Embodied TaskALFWorld
Overall Success Rate97.5
169
Embodied ReasoningALFWorld
Accuracy0.96
151
Embodied Task CompletionALFWorld
Success Rate94
96
Instruction FollowingALFWorld
Accuracy89.3
82
Interactive Decision MakingALFWorld (test)
Success Rate96.87
71
Embodied Decision MakingALFWorld held-out (test)
Score95.5
49
Agentic reasoningALFWorld
Success Rate76.02
45
Interactive Task CompletionALFWorld
Pick Success Rate100
45
Embodied Agent TaskALFWorld Unseen
Success Rate79.1
40
Agent TaskAlfWorld
Success Rate86.7
40
Instruction FollowingALFWorld (val seen)
Success Rate (SR)88.57
39
Embodied Instruction FollowingAlfWorld
Average Success Rate99.3
33
Interactive Decision MakingALFWorld Unseen
Success Rate97.76
32
Interactive Decision MakingALFWorld Seen
Success Rate97.86
32
Multi-turn Agent InteractionALFWorld (test)
Success Rate (Pick)100
31
Interactive Environment Task CompletionALFWorld (Unseen)
Average Reward91.8
31
Interactive Environment Task CompletionALFWorld (Seen)
Average Reward90.2
31
Embodied AgentALFWorld
Success Rate100
31
Embodied decision-makingALFWorld
Success Rate82.84
31
Mean RewardALFWorld
Mean Reward0.767
30
Text-based embodied AIALFWorld
Pick Success100
30
Multi-turn planningALFWorld (test)
Reward97.9
30
Embodied Task ExecutionALFWorld
Success Rate93.28
29
Interactive Instruction FollowingALFWorld Unseen
Success Rate86.68
28
Showing 25 of 132 rows