Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Readbook on VirtualHome bedroom_and_bathroom
Loading...
83.1
TSR
Deepseek-R1
71.764
74.707
77.65
80.593
Mar 9, 2026
TSR
TSR_R
TSR_C
Error Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
TSR
TSR_R
TSR_C
Error Rate
Deepseek-R1
Model=Deepseek-R1
2026.03
83.1
95.2
100
12.8
Llama3.3-70B
Model=Llama3.3-70B
2026.03
76.5
88.6
60
50.4
GPT-5-mini
Model=GPT-5-mini
2026.03
72.2
94.9
100
75.2
Feedback
Search any
task
Search any
task