Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Story Generation on Story
Loading...
122.6
Quality
Post-trained
78.712
90.106
101.5
112.894
Feb 6, 2026
Quality
Diversity
Updated 4d ago
Evaluation Results
Method
Method
Links
Quality
Diversity
Post-trained
Backbone=Gemma
2026.02
122.6
0.294
SLR
Backbone=Gemma
2026.02
117.7
0.364
Proxy-Soup
Backbone=Gemma
2026.02
117.5
0.326
Post-trained
Backbone=Qwen
2026.02
94.7
0.152
SLR
Backbone=Qwen
2026.02
93.8
0.19
Post-trained
Backbone=Llama
2026.02
93.4
0.274
SLR
Backbone=Llama
2026.02
90.8
0.366
Proxy-Soup
Backbone=Llama
2026.02
87.8
0.297
Proxy-Soup
Backbone=Qwen
2026.02
80.4
0.179
Feedback
Search any
task
Search any
task