Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Building Blocks on Real-world robot tasks 1.0 (test)
Loading...
80
Success Rate
Claude-3.7
-3.2
18.4
40
61.6
May 26, 2026
Success Rate
Updated 7d ago
Evaluation Results
Method
Method
Links
Success Rate
Claude-3.7
Tool Augmentation=w/
2026.05
80
Claude-4.6
Tool Augmentation=w/
2026.05
80
Qwen2.5-32B
Tool Augmentation=w/
2026.05
70
GPT-4o
Tool Augmentation=w/
2026.05
60
GPT-5
Tool Augmentation=w/
2026.05
60
Qwen3-8B
Tool Augmentation=w/
2026.05
50
Qwen3-32B
Tool Augmentation=w/
2026.05
50
Qwen3.5-35B
Tool Augmentation=w/
2026.05
50
GPT-4o
Tool Augmentation=w/o
2026.05
0
GPT-5
Tool Augmentation=w/o
2026.05
0
Claude-3.7
Tool Augmentation=w/o
2026.05
0
Claude-4.6
Tool Augmentation=w/o
2026.05
0
Qwen3-8B
Tool Augmentation=w/o
2026.05
0
Qwen3-32B
Tool Augmentation=w/o
2026.05
0
Qwen2.5-32B
Tool Augmentation=w/o
2026.05
0
Qwen3.5-35B
Tool Augmentation=w/o
2026.05
0
Feedback
Search any
task
Search any
task