Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Action Generation on Human Evaluation 48 instances (test)
Loading...
58.3
Preference Rate
Do-Undo
32.3
39.05
45.8
52.55
Dec 15, 2025
Preference Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Preference Rate
Do-Undo
2025.12
58.3
BAGEL
2025.12
41.6
Do-Undo(c)
consistency=true
2025.12
33.3
Feedback
Search any
task
Search any
task