| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Role Fidelity | RoleBench (test) | RAW Score36.4 | 10 | |
| Instruction Generalization | RoleBench Instruction Generalization | CUS Score57.6 | 10 | |
| Role Generalization | RoleBench English 1.0 (Role Generalization) | CUS Score60.2 | 7 | |
| Instruction Generalization | RoleBench Chinese instruction generalization 1.0 | ROUGE-L (CUS)53.7 | 7 | |
| Instruction Generalization | RoleBench instruction generalization | GPT-4 Win Rate55.8 | 5 | |
| Role-playing | RoleBench Chinese (instruction generalization) | Win Rate (vs GPT-4)36.4 | 4 | |
| Role-playing Instruction Following | RoleBench English Role Generalization | Win Rate (GPT-4)64.5 | 4 |