| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RoleLLM | GPT-4o Win Rate94.46 | 28 | 4d ago | ||
| CharacterBench 1.0 (test) | Qwen2.5-7B-Instruct + Character-R1 | MC4.444 | 28 | 4d ago | |
| RPGBench Aggregate (Overall) | CoRL | Avg Score-0.026 | 18 | 4d ago | |
| RPGBench Dialogue Shift (Generalization) | Turn Composition-0.956 | 18 | 4d ago | ||
| RPGBench Character Shift (Generalization) | Deviation Score (Literature)-0.8 | 18 | 4d ago | ||
| RPGBench User Shift Generalization | CoRL | RP Score (German)-0.016 | 18 | 4d ago | |
| RPGBench In-distribution | RL | R-EMI-0.034 | 18 | 4d ago | |
| Role-playing evaluation (Main characters) | Codified FSM | ROUGE-L (Haruhi)83.88 | 12 | 4d ago | |
| RoleMRC | MOA (GRPO) | KR0.67 | 7 | 4d ago | |
| PersonaGym | Engagement (EA)4.98 | 7 | 4d ago | ||
| Role-playing evaluation (Minor characters) | Codified Profile | K-On! ROUGE-L21.21 | 5 | 4d ago | |
| CharacterEval unseen roles transfer setting | Baichuan2-7B + PCL | CC2.749 | 4 | 4d ago | |
| Role-playing human assessment set (test) | Baichuan2-7B+PCL | Win Count602 | 4 | 4d ago | |
| RoleBench Chinese (instruction generalization) | RoleGLM | Win Rate (vs GPT-4)36.4 | 4 | 4d ago |