Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Role-Playing on RPGBench Character Shift (Generalization)

-0.8Deviation Score (Literature)

RFT

-0.83076-0.62313-0.4155-0.20787Dec 19, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
-0.8-0.78-0.828-0.812
2025.12
-0.579-0.479-0.519-0.586
2025.12
-0.564-0.488-0.586-0.631
2025.12
-0.372-0.282-0.321-0.349
2025.12
-0.34-0.312-0.355-0.354
2025.12
-0.254-0.141-0.222-0.245
2025.12
-0.198-0.137-0.191-0.186
2025.12
-0.18-0.184-0.193-0.186
2025.12
-0.131-0.093-0.141-0.128
2025.12
-0.111-0.062-0.078-0.079
2025.12
-0.104-0.05-0.08-0.11
2025.12
-0.102-0.072-0.089-0.094
2025.12
-0.082-0.026-0.072-0.056
2025.12
-0.078-0.021-0.073-0.051
2025.12
-0.061-0.033-0.069-0.077
2025.12
-0.056-0.008-0.076-0.042
2025.12
-0.049-0.04-0.038-0.035
2025.12
-0.031-0.007-0.021-0.029