Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Putdishwasher on Household Tasks bedroom and kitchen
Loading...
79.2
Success Rate
Deepseek-R1
-2.4972
18.71265
39.9225
61.13235
Mar 9, 2026
Success Rate
Action Accuracy
Total Steps
Success Rate (R)
Success Rate (C)
Error Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Action Accuracy
Total Steps
Success Rate (R)
Success Rate (C)
Error Rate
Deepseek-R1
Model=Deepseek-R1
2026.03
79.2
86.7
13
-
-
-
Llama3.3-70B
Model=Llama3.3-70B
2026.03
71.1
86.6
9
-
-
-
GPT-5-mini
Model=GPT-5-mini
2026.03
64.5
75.5
10
-
-
-
Deepseek-R1
Model=Deepseek-R1
2026.03
0.792
-
-
0.95
1
0.794
Llama3.3-70B
Model=Llama3.3-70B
2026.03
0.711
-
-
0.882
0.8
0.317
GPT-5-mini
Model=GPT-5-mini
2026.03
0.645
-
-
0.945
0.9
0.595
Feedback
Search any
task
Search any
task