Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Story Generation on Story Generation Gender (US) Even distribution
Loading...
0.26
MAE
Exp
0.2552
0.2876
0.32
0.3524
Apr 7, 2026
MAE
Updated 10d ago
Evaluation Results
Method
Method
Links
MAE
Exp
Backbone=Llama1B
2026.04
0.26
DPO
Backbone=Qwen1.5B
2026.04
0.28
IFT
Backbone=Qwen1.5B
2026.04
0.31
Ours
Backbone=Qwen1.5B
2026.04
0.31
Zero
Backbone=Qwen1.5B
2026.04
0.32
DPO
Backbone=Llama1B
2026.04
0.32
Ours
Backbone=Llama1B
2026.04
0.32
Exp
Backbone=Qwen1.5B
2026.04
0.33
Zero
Backbone=Llama1B
2026.04
0.33
IFT
Backbone=Llama1B
2026.04
0.34
Exp
Backbone=Qwen7B
2026.04
0.35
Exp
Backbone=Llama8B
2026.04
0.35
DPO
Backbone=Llama8B
2026.04
0.35
DPO
Backbone=Qwen7B
2026.04
0.36
IFT
Backbone=Qwen7B
2026.04
0.37
IFT
Backbone=Llama8B
2026.04
0.37
Ours
Backbone=Llama8B
2026.04
0.37
Zero
Backbone=Qwen7B
2026.04
0.38
Ours
Backbone=Qwen7B
2026.04
0.38
Zero
Backbone=Llama8B
2026.04
0.38
Feedback
Search any
task
Search any
task