| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Personalized Generation | Roleplay (test) | Accuracy72.36 | 10 | |
| Preference Optimization | Roleplay 1500 users | Winrate90.9 | 10 | |
| Jailbreak Detection | Roleplay | Detection Rate100 | 4 | |
| Roleplay Personalization | Roleplay Human Evaluation (50 users, 11 questions) | Winrate72.3 | 2 |