| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RoleBench Instruction Generalization | RoleGPT | CUS Score57.6 | 10 | 4d ago | |
| RoleBench Chinese instruction generalization 1.0 | RoleGPT | ROUGE-L (CUS)53.7 | 7 | 4d ago | |
| RoleBench instruction generalization | RoleLLaMA-7B | GPT-4 Win Rate55.8 | 5 | 4d ago |