Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Role-playing benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Role-playing
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
CharacterBench
CRPO
MC
4.525
50
8d ago
CharacterBench latest (full)
CRPO
Overall Score
4.525
47
8d ago
RoleLLM
Full System Prompt
GPT-4o Win Rate
94.46
28
3mo ago
CharacterBench 1.0 (test)
Qwen2.5-7B-Instruct + Character-R1
MC
4.444
28
3mo ago
RPGBench Aggregate (Overall)
CoRL
Avg Score
-0.026
18
3mo ago
RPGBench Dialogue Shift (Generalization)
RFT
Turn Composition
-0.956
18
3mo ago
RPGBench Character Shift (Generalization)
RFT
Deviation Score (Literature)
-0.8
18
3mo ago
RPGBench User Shift Generalization
CoRL
RP Score (German)
-0.016
18
3mo ago
RPGBench In-distribution
RL
R-EMI
-0.034
18
3mo ago
Role-playing evaluation (Main characters)
Codified FSM
ROUGE-L (Haruhi)
83.88
12
3mo ago
DynSess-Eval Human Evaluation 1.0 (test)
Doubao-1.5-pro-character
Average Score
3.38
10
5d ago
RoleMRC
MOA (GRPO)
KR
0.67
7
3mo ago
PersonaGym
GPT-4o
Engagement (EA)
4.98
7
3mo ago
Role-playing evaluation (Minor characters)
Codified Profile
K-On! ROUGE-L
21.21
5
3mo ago
CharacterEval unseen roles transfer setting
Baichuan2-7B + PCL
CC
2.749
4
3mo ago
Role-playing human assessment set (test)
Baichuan2-7B+PCL
Win Count
602
4
3mo ago
RoleBench Chinese (instruction generalization)
RoleGLM
Win Rate (vs GPT-4)
36.4
4
3mo ago
Showing 17 of 17 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs