Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Desktop Cleaning on Real-world robot tasks 1.0 (test)
Loading...
8
Success Rate
GPT-4o
-0.32
1.84
4
6.16
May 26, 2026
Success Rate
Updated 7d ago
Evaluation Results
Method
Method
Links
Success Rate
GPT-4o
Tool Augmentation=w/
2026.05
8
GPT-5
Tool Augmentation=w/
2026.05
7
Claude-3.7
Tool Augmentation=w/
2026.05
6
Claude-4.6
Tool Augmentation=w/
2026.05
6
Qwen3-8B
Tool Augmentation=w/
2026.05
6
Qwen3-32B
Tool Augmentation=w/
2026.05
6
Qwen2.5-32B
Tool Augmentation=w/
2026.05
5
Qwen3.5-35B
Tool Augmentation=w/
2026.05
4
GPT-4o
Tool Augmentation=w/o
2026.05
0
GPT-5
Tool Augmentation=w/o
2026.05
0
Claude-3.7
Tool Augmentation=w/o
2026.05
0
Claude-4.6
Tool Augmentation=w/o
2026.05
0
Qwen3-8B
Tool Augmentation=w/o
2026.05
0
Qwen3-32B
Tool Augmentation=w/o
2026.05
0
Qwen2.5-32B
Tool Augmentation=w/o
2026.05
0
Qwen3.5-35B
Tool Augmentation=w/o
2026.05
0
Feedback
Search any
task
Search any
task