Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Just-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Utility EvaluationJust-Eval
Just-Eval Average Score4.83
50
Model Helpfulness EvaluationJust-Eval (test)
Helpfulness Score4.96
42
Benign prompt classificationJust-Eval benign
Accuracy99
15
Instruction-followingJust-Eval
Helpfulness4.25
10
General Usability EvaluationJust Eval
Helpfulness4.78
6
Showing 5 of 5 rows