Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open the drawer on AutoEval Real-world held-out tasks
Loading...
0.58
Success Rate
World-Gymnast
0.2888
0.3644
0.44
0.5156
Feb 2, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
World-Gymnast
Backbone=600M paramete...
2026.02
0.58
SFT
Backbone=OpenVLA 7B, F...
2026.02
0.4
Iter-SFT
Backbone=OpenVLA 7B, S...
2026.02
0.3
Feedback
Search any
task
Search any
task