Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Just-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Model Helpfulness EvaluationJust-Eval (test)
Helpfulness Score4.96
42
Utility EvaluationJust-Eval
Just-Eval Average Score4.83
18
Instruction-followingJust-Eval
Helpfulness4.25
10
Showing 3 of 3 rows