Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Readbook on Household Tasks bedroom_and_kitchen
Loading...
85.4
Success Rate
Deepseek-R1
76.352
78.701
81.05
83.399
Mar 9, 2026
Success Rate
Average Action Accuracy
Total Steps
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Average Action Accuracy
Total Steps
Deepseek-R1
Model=Deepseek-R1
2026.03
85.4
72.5
8
Llama3.3-70B
Model=Llama3.3-70B
2026.03
79.7
83.3
7
GPT-5-mini
Model=GPT-5-mini
2026.03
76.7
64.3
5
Feedback
Search any
task
Search any
task