Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generative tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Generative task evaluationGenerative tasks single-preference setting
Accuracy (Ignored)78
16
Generative tasksGenerative tasks multi-preference setting
Macro Acc (IA)63
16
Showing 2 of 2 rows