Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Preference Evaluation on Cooking
Loading...
94
Step Faithfulness Win Rate
Stitch-a-Demo
74.24
79.37
84.5
89.63
Mar 18, 2025
Step Faithfulness Win Rate
Goal Faithfulness Win Rate
Visual Quality Win Rate
Overall Preference Win Rate
Updated 12d ago
Evaluation Results
Method
Method
Links
Step Faithfulness Win Rate
Goal Faithfulness Win Rate
Visual Quality Win Rate
Overall Preference Win Rate
Stitch-a-Demo
Comparison baseline=Sh...
2025.03
94
94
85
98
Stitch-a-Demo
Comparison baseline=Re...
2025.03
77
74
74
77
Stitch-a-Demo
Comparison baseline=In...
2025.03
75
72
78
75
Feedback
Search any
task
Search any
task