Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preparefood on VirtualHome kitchen and livingroom
Loading...
80.8
TSR
Deepseek-R1
67.176
70.713
74.25
77.787
Mar 9, 2026
TSR
TSR_R
TSR_C
ER
Updated 1mo ago
Evaluation Results
Method
Method
Links
TSR
TSR_R
TSR_C
ER
Deepseek-R1
Model=Deepseek-R1
2026.03
80.8
96.2
100
83.9
Llama3.3-70B
Model=Llama3.3-70B
2026.03
73.4
90.5
80
47.4
GPT-5-mini
Model=GPT-5-mini
2026.03
67.7
96.6
100
73
Feedback
Search any
task
Search any
task