Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Embodied Task Completion on AI2-THOR (test)
Loading...
75
SR
ReCAPA
8.44
25.72
43
60.28
Apr 23, 2026
SR
TR
Coverage
Balance
Updated 1mo ago
Evaluation Results
Method
Method
Links
SR
TR
Coverage
Balance
ReCAPA
Model Category=Multi-M...
2026.04
75
93
95
93
LLaMAR
Model Category=Multi-M...
2026.04
68
90
95
85
GPT-4V
Model Category=Multi-M...
2026.04
66
91
97
82
CogVLM
Model Category=Multi-M...
2026.04
61
89
95
80
IDEFICS-2
Model Category=Multi-M...
2026.04
57
86
94
78
LLaVA
Model Category=Multi-M...
2026.04
54
84
91
75
GPT-4o
Model Category=Multi-M...
2026.04
51
85
95
83
ReAct
Model Category=Single-...
2026.04
34
72
92
67
CoELA
Model Category=Single-...
2026.04
25
46
76
73
CoT
Model Category=Single-...
2026.04
14
59
87
62
SmartLLM
Model Category=Single-...
2026.04
11
23
91
45
Feedback
Search any
task
Search any
task