Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NOVELTYBENCH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text GenerationNoveltyBench
Diversity10
81
Patience-discounted reward evaluationNoveltyBench
Utility4.096
36
Output DiversityNOVELTYBENCH
Distinct Score52.42
31
Instruction FollowingNoveltyBench
Lexical Dominance40.1
7
Novelty EvaluationNoveltyBench
Overall Dominance44
5
Diversity MeasurementNoveltyBench curated
D_Can Mean48.1
4
Human EvaluationNoveltyBench
Quality4.04
2
Showing 7 of 7 rows