Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Just-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Model Helpfulness EvaluationJust-Eval (test)
Helpfulness Score4.96
42
Utility EvaluationJust-Eval
Just-Eval Average Score4.83
18
Instruction-followingJust-Eval
Helpfulness4.25
10
Showing 3 of 3 rows