Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RPGBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Role-PlayingRPGBench Aggregate (Overall)
Avg Score-0.026
18
Role-PlayingRPGBench Dialogue Shift (Generalization)
Turn Composition-0.956
18
Role-PlayingRPGBench Character Shift (Generalization)
Deviation Score (Literature)-0.8
18
Role-PlayingRPGBench User Shift Generalization
RP Score (German)-0.016
18
Role-PlayingRPGBench In-distribution
R-EMI-0.034
18
Showing 5 of 5 rows