Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Roleplay

Benchmarks

Task NameDataset NameSOTA ResultTrend
Personalized GenerationRoleplay (test)
Accuracy72.36
10
Preference OptimizationRoleplay 1500 users
Winrate90.9
10
Jailbreak DetectionRoleplay
Detection Rate100
4
Roleplay PersonalizationRoleplay Human Evaluation (50 users, 11 questions)
Winrate72.3
2
Showing 4 of 4 rows