Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Human Evaluation Study

Benchmarks

Task NameDataset NameSOTA ResultTrend
AestheticsHuman Evaluation Study
Average Rating Score3.664
8
Multi-event Video GenerationHuman Evaluation Study
Omission Score4.31
7
Image-to-Video GenerationHuman Evaluation Study
Human Preference (%)84
6
3D Indoor Scene SynthesisHuman Evaluation Study Generated 3D Scenes
Overall Score2.506
4
Text-to-Video GenerationHuman Evaluation Study
Human Preference81
4
Counter-Speech Effectiveness EvaluationHuman Evaluation Study Counter-Speech Post-Edited
FACT3.458
3
Video GenerationHuman Evaluation Study Aggregated across video generation categories
Validity Rate69
3
Social Deduction Game Agent EvaluationHuman Evaluation Study (Good Players)
Contributed Success3.9
2
Showing 8 of 8 rows