Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Text-to-interleaved generation on RecipeGen (test)
Loading...
4.35
Temporal Coherence (GPT-4o)
Doubao
0.918
1.809
2.7
3.591
Dec 20, 2025
Temporal Coherence (GPT-4o)
Temporal Coherence (Human)
Instruction Following (GPT-4o)
Instruction Following (Human)
Narrative Consistency (GPT-4o)
Narrative Consistency (Human)
CLIP Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Temporal Coherence (GPT-4o)
Temporal Coherence (Human)
Instruction Following (GPT-4o)
Instruction Following (Human)
Narrative Consistency (GPT-4o)
Narrative Consistency (Human)
CLIP Score
Doubao
Model Type=Proprietary...
2025.12
4.35
4.1
4.25
4.05
4.95
4.65
0.25
Loom
Framework=Multi-turn f...
2025.12
4.25
4.15
3.75
3.35
4.7
4.3
0.269
Anole
2025.12
1.55
1.05
1.35
1.05
1.95
1.35
0.219
Bagel
Framework=Multi-turn d...
2025.12
1.4
1.25
1.55
1.05
-
-
0.217
Janus-Pro
Framework=Multi-turn d...
2025.12
1.05
1.1
1.1
1
-
-
0.105
Feedback
Search any
task
Search any
task