Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Setuptable on VirtualHome livingroom_and_bedroom
Loading...
82.7
Task Success Rate (TSR)
Deepseek-R1
70.948
73.999
77.05
80.101
Mar 9, 2026
Task Success Rate (TSR)
Task Success Rate (TSR_R)
Task Success Rate (TSR_C)
Error Rate (ER)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Task Success Rate (TSR)
Task Success Rate (TSR_R)
Task Success Rate (TSR_C)
Error Rate (ER)
Deepseek-R1
Model=Deepseek-R1
2026.03
82.7
94.5
100
51
Llama3.3-70B
Model=Llama3.3-70B
2026.03
76
87
60
11.8
GPT-5-mini
Model=GPT-5-mini
2026.03
71.4
94
100
39.2
Feedback
Search any
task
Search any
task