Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RoleBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Role FidelityRoleBench (test)
RAW Score36.4
10
Instruction GeneralizationRoleBench Instruction Generalization
CUS Score57.6
10
Role GeneralizationRoleBench English 1.0 (Role Generalization)
CUS Score60.2
7
Instruction GeneralizationRoleBench Chinese instruction generalization 1.0
ROUGE-L (CUS)53.7
7
Instruction GeneralizationRoleBench instruction generalization
GPT-4 Win Rate55.8
5
Role-playingRoleBench Chinese (instruction generalization)
Win Rate (vs GPT-4)36.4
4
Role-playing Instruction FollowingRoleBench English Role Generalization
Win Rate (GPT-4)64.5
4
Showing 7 of 7 rows