Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Robotic Task Planning in Dynamic Environments on VirtualHome
Loading...
92
Success Rate
LookPlanGraph
-3.68
21.16
46
70.84
Dec 24, 2025
Success Rate
APP
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
APP
LookPlanGraph
Backbone=Llama3.2, Gro...
2025.12
92
96
LookPlanGraph
Backbone=GPT-4o, Groun...
2025.12
86
92
LookPlanGraph
Backbone=Llama3.2, Gro...
2025.12
60
68
LookPlanGraph
Backbone=GPT-4o, Groun...
2025.12
52
63
ReAct
Backbone=Llama3.2, Gro...
2025.12
50
74
SayPlan Lite
Backbone=GPT-4o, Groun...
2025.12
48
67
LLM-as-P
Backbone=GPT-4o, Groun...
2025.12
44
65
SayPlan Lite
Backbone=Llama3.2, Gro...
2025.12
39
65
SayPlan
Backbone=GPT-4o, Groun...
2025.12
38
59
ReAct
Backbone=GPT-4o, Groun...
2025.12
34
58
LLM+P
Backbone=GPT-4o, Groun...
2025.12
32
58
ReAct
Backbone=Llama3.2, Gro...
2025.12
30
64
ReAct
Backbone=GPT-4o, Groun...
2025.12
22
50
SayPlan
Backbone=Llama3.2, Gro...
2025.12
21
53
LLM-as-P
Backbone=Llama3.2, Gro...
2025.12
16
53
LLM+P
Backbone=Llama3.2, Gro...
2025.12
0
38
Feedback
Search any
task
Search any
task