Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preparefood on Household Tasks kitchen_and_livingroom
Loading...
80.8
Original Success Rate
Deepseek-R1
67.176
70.713
74.25
77.787
Mar 9, 2026
Original Success Rate
Avg Action Accuracy
Total Steps
Updated 1mo ago
Evaluation Results
Method
Method
Links
Original Success Rate
Avg Action Accuracy
Total Steps
Deepseek-R1
Model=Deepseek-R1
2026.03
80.8
86.5
15
Llama3.3-70B
Model=Llama3.3-70B
2026.03
73.4
83.3
19
GPT-5-mini
Model=GPT-5-mini
2026.03
67.7
95.5
17
Feedback
Search any
task
Search any
task