Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Put the eggplant into the blue sink on AutoEval Real-world held-out tasks
Loading...
72
Success Rate
World-Gymnast
1.28
19.64
38
56.36
Feb 2, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
World-Gymnast
Backbone=600M paramete...
2026.02
72
Iter-SFT
Backbone=OpenVLA 7B
2026.02
10
SFT
Backbone=OpenVLA 7B
2026.02
4
Feedback
Search any
task
Search any
task