| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Utility Evaluation | Just-Eval | Just-Eval Average Score4.83 | 50 | |
| Model Helpfulness Evaluation | Just-Eval (test) | Helpfulness Score4.96 | 42 | |
| Benign prompt classification | Just-Eval benign | Accuracy99 | 15 | |
| Instruction-following | Just-Eval | Helpfulness4.25 | 10 | |
| General Usability Evaluation | Just Eval | Helpfulness4.78 | 6 |