Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-agent planning and execution on MAT2-THOR Simple Tasks
Loading...
92
TCR
Scale Plan
33.76
48.88
64
79.12
Mar 9, 2026
TCR
GCR
Error Rate (ER)
Updated 1mo ago
Evaluation Results
Method
Method
Links
TCR
GCR
Error Rate (ER)
Scale Plan
Backbone=GPT-5.2
2026.03
92
95
98
LaMMA-P (LLM corrected)
Backbone=GPT-5.2
2026.03
76
84
91
LLM+P (PDDL based)
Backbone=GPT-5.2
2026.03
48
57
74
LLM as a Planner
Backbone=GPT-5.2
2026.03
44
56
79
LaMMA-P (PDDL plan only)
Backbone=GPT-5.2
2026.03
36
56
79
Feedback
Search any
task
Search any
task