Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Put the eggplant into the yellow basket on AutoEval Real-world held-out tasks
Loading...
78
Success Rate
World-Gymnast
5.2
24.1
43
61.9
Feb 2, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
World-Gymnast
Backbone=600M paramete...
2026.02
78
Iter-SFT
Backbone=OpenVLA 7B
2026.02
17
SFT
Backbone=OpenVLA 7B
2026.02
8
Feedback
Search any
task
Search any
task