Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Putdishwasher on VirtualHome livingroom_and_bedroom
Loading...
82.8
TSR
Deepseek-R1
71.048
74.099
77.15
80.201
Mar 9, 2026
TSR
TSR_R
TSR_C
Error Rate (ER)
Updated 1mo ago
Evaluation Results
Method
Method
Links
TSR
TSR_R
TSR_C
Error Rate (ER)
Deepseek-R1
Model=Deepseek-R1
2026.03
82.8
94
100
99.2
Llama3.3-70B
Model=Llama3.3-70B
2026.03
76.1
86
70
83.3
GPT-5-mini
Model=GPT-5-mini
2026.03
71.5
92.8
70
59.5
Feedback
Search any
task
Search any
task