Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AlfWorld

Benchmarks

Task NameDataset NameSOTA ResultTrend
Embodied ReasoningALFWorld
Accuracy0.96
151
Interactive Decision-makingALFWorld
Overall Success Rate97.71
118
Embodied TaskALFWorld
Overall Success Rate96.9
96
Instruction FollowingALFWorld
Accuracy89.3
82
Interactive Decision MakingALFWorld (test)
Success Rate96.87
67
Embodied decision-makingALFWorld
Success Rate82.84
31
Interactive Task CompletionALFWorld
Pick Success Rate100
28
Sequential Decision MakingALFWorld (test)
Success Rate96.27
26
Interactive Decision MakingALFWorld Unseen
Success Rate97.76
23
Interactive Decision MakingALFWorld Seen
Success Rate97.86
23
Embodied InteractionALFWorld
Success Rate94.78
22
Decision MakingAlfWorld
Steps6.4
22
Agentic ReasoningALFWorld (test)
Success Rate97.7
21
Agent TaskAlfWorld
Success Rate83.6
21
Household Agent InteractionALFWorld
Pick Success Rate99.1
20
Interactive environment task successALFWorld (test)
Overall Success Rate91.79
20
Interactive Embodied Agent TaskALFWorld (val)
Pick Success Rate100
19
Trajectory UnlearningAlfWorld
Unlearn Efficacy100
18
Multi-turn decision makingALFWorld (val-unseen)
Success Rate94.1
18
Embodied AI ReasoningALFWorld
CoT Match Rate100
18
Interactive ReasoningALFWorld
Average Reasoning Length (tokens)47.9
18
Agentic task completionALFWorld
Look Success100
18
Agent Behavior AdaptationAlfWorld (AW) (test)
Loop Ratio1,040
17
Embodied Task PlanningALFWorld (test)
Success Rate (Avg)98.7
17
Next-state predictionALFWorld (AW)
EM Accuracy99.87
16
Showing 25 of 69 rows