Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-agent planning and execution on MAT2-THOR Complex Tasks
Loading...
59
TCR
Scale Plan
16.36
27.43
38.5
49.57
Mar 9, 2026
TCR
GCR
Error Rate (ER)
Updated 1mo ago
Evaluation Results
Method
Method
Links
TCR
GCR
Error Rate (ER)
Scale Plan
Backbone=GPT-5.2
2026.03
59
75
92
LLM+P (PDDL based)
Backbone=GPT-5.2
2026.03
24
39
61
LaMMA-P (LLM corrected)
Backbone=GPT-5.2
2026.03
24
56
77
LLM as a Planner
Backbone=GPT-5.2
2026.03
18
40
69
LaMMA-P (PDDL plan only)
Backbone=GPT-5.2
2026.03
18
37
66
Feedback
Search any
task
Search any
task