Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Putfridge on VirtualHome bathroom_and_livingroom
Loading...
81.9
TSR
Deepseek-R1
69.42
72.66
75.9
79.14
Mar 9, 2026
TSR
TSR_R
TSR_C
Error Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
TSR
TSR_R
TSR_C
Error Rate
Deepseek-R1
Model=Deepseek-R1
2026.03
81.9
93.9
100
80.8
Llama3.3-70B
Model=Llama3.3-70B
2026.03
74.9
85.9
70
96.2
GPT-5-mini
Model=GPT-5-mini
2026.03
69.9
92.8
90
42.3
Feedback
Search any
task
Search any
task