| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RoleBench Instruction Generalization | RoleGPT | CUS Score57.6 | 10 | 3mo ago | |
| RoleBench Chinese instruction generalization 1.0 | RoleGPT | ROUGE-L (CUS)53.7 | 7 | 3mo ago | |
| OOD Instructions Novel Prompts | DeLock | T3 Success Count [C]95 | 6 | 1mo ago | |
| RoleBench instruction generalization | RoleLLaMA-7B | GPT-4 Win Rate55.8 | 5 | 3mo ago |