Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RoleChat

Benchmarks

Task NameDataset NameSOTA ResultTrend
Role-playing Dialogue EvaluationRoleChat gold-standard evaluation set
Logical Coherence96.6
10
Role-playing EvaluationRoleChat (val)
Overall MSE0.21
5
Role-playing Quality EvaluationRoleChat (Human Evaluation)
Win Rate87
2
Showing 3 of 3 rows