Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Human Preference Evaluation

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code-switched text generationEnglish-to-Code-Switched human preference evaluation (Out of domain)
Score434.5
6
Image GenerationHuman Preference Evaluation 55 prompts
Votes500
6
Human Preference EvaluationHuman Preference Evaluation 371 prompts (test)
Recall @139.89
3
Human Preference EvaluationHuman Preference Evaluation 466 prompts (test)
Preference Accuracy65.14
3
Showing 4 of 4 rows