Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Quality, Factuality, and Safety Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language Model EvaluationQuality, Factuality, and Safety Evaluation Suite (test)
Generation Quality Score86.3
7
Showing 1 of 1 rows