Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-agent planning and execution on MAT2-THOR Overall Tasks
Loading...
78
TCR
Scale Plan
31.2
43.35
55.5
67.65
Mar 9, 2026
TCR
GCR
Err
Updated 1mo ago
Evaluation Results
Method
Method
Links
TCR
GCR
Err
Scale Plan
Backbone=GPT-5.2
2026.03
78
85
94
LaMMA-P (LLM corrected)
Backbone=GPT-5.2
2026.03
53
69
85
LLM+P (PDDL based)
Backbone=GPT-5.2
2026.03
39
50
67
LLM as a Planner
Backbone=GPT-5.2
2026.03
37
51
77
LaMMA-P (PDDL plan only)
Backbone=GPT-5.2
2026.03
33
51
74
Feedback
Search any
task
Search any
task