Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Close the drawer on AutoEval Real-world held-out tasks
Loading...
62
Success Rate
SFT
59.92
60.46
61
61.54
Feb 2, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
SFT
Backbone=OpenVLA 7B
2026.02
62
World-Gymnast
Backbone=600M paramete...
2026.02
62
Iter-SFT
Backbone=OpenVLA 7B
2026.02
60
Feedback
Search any
task
Search any
task