| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Role-playing Dialogue Evaluation | RoleChat gold-standard evaluation set | Logical Coherence96.6 | 10 | |
| Role-playing Evaluation | RoleChat (val) | Overall MSE0.21 | 5 | |
| Role-playing Quality Evaluation | RoleChat (Human Evaluation) | Win Rate87 | 2 |