Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Balance Scale Manipulation on Real-world robot tasks 1.0 (test)
Loading...
90
Success Rate
Claude-4.6
-3.6
20.7
45
69.3
May 26, 2026
Success Rate
Updated 7d ago
Evaluation Results
Method
Method
Links
Success Rate
Claude-4.6
Tool Augmentation=w/
2026.05
90
GPT-5
Tool Augmentation=w/
2026.05
80
Claude-3.7
Tool Augmentation=w/
2026.05
80
Qwen3-32B
Tool Augmentation=w/
2026.05
80
Qwen3.5-35B
Tool Augmentation=w/
2026.05
70
GPT-4o
Tool Augmentation=w/
2026.05
60
Qwen2.5-32B
Tool Augmentation=w/
2026.05
60
Qwen3-8B
Tool Augmentation=w/
2026.05
40
GPT-4o
Tool Augmentation=w/o
2026.05
0
GPT-5
Tool Augmentation=w/o
2026.05
0
Claude-3.7
Tool Augmentation=w/o
2026.05
0
Claude-4.6
Tool Augmentation=w/o
2026.05
0
Qwen3-8B
Tool Augmentation=w/o
2026.05
0
Qwen3-32B
Tool Augmentation=w/o
2026.05
0
Qwen2.5-32B
Tool Augmentation=w/o
2026.05
0
Qwen3.5-35B
Tool Augmentation=w/o
2026.05
0
Feedback
Search any
task
Search any
task