Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-agent planning and execution on MAT2-THOR Vague Tasks
Loading...
71
Task Completion Rate (TCR)
Scale Plan
41.88
49.44
57
64.56
Mar 9, 2026
Task Completion Rate (TCR)
Goal Completion Rate (GCR)
Error Rate (ER)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Task Completion Rate (TCR)
Goal Completion Rate (GCR)
Error Rate (ER)
Scale Plan
Backbone=GPT-5.2
2026.03
71
71
82
LLM as a Planner
Backbone=GPT-5.2
2026.03
57
57
88
LaMMA-P (PDDL plan only)
Backbone=GPT-5.2
2026.03
57
64
77
LLM+P (PDDL based)
Backbone=GPT-5.2
2026.03
43
52
58
LaMMA-P (LLM corrected)
Backbone=GPT-5.2
2026.03
43
50
85
Feedback
Search any
task
Search any
task