Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Human Evaluation Study

Benchmarks

Task NameDataset NameSOTA ResultTrend
AestheticsHuman Evaluation Study
Average Rating Score3.664
8
Multi-event Video GenerationHuman Evaluation Study
Omission Score4.31
7
Image-to-Video GenerationHuman Evaluation Study
Human Preference (%)84
6
Text-to-Video GenerationHuman Evaluation Study
Human Preference81
4
Video GenerationHuman Evaluation Study Aggregated across video generation categories
Validity Rate69
3
Social Deduction Game Agent EvaluationHuman Evaluation Study (Good Players)
Contributed Success3.9
2
Showing 6 of 6 rows