Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Role-playing Quality Evaluation on RoleChat (Human Evaluation)
Loading...
87
Win Rate
RoleJudge
78.68
80.84
83
85.16
Apr 15, 2026
Win Rate
Lose Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate
Lose Rate
RoleJudge
Comparison Baseline=Qw...
2026.04
87
13
RoleJudge
Comparison Baseline=Ge...
2026.04
79
21
Feedback
Search any
task
Search any
task