Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Role-playing evaluation on RoleBench
Loading...
37.1
Win Rate
DPO-Qwen3-8B
-1.484
8.533
18.55
28.567
May 16, 2026
Win Rate
Updated 16d ago
Evaluation Results
Method
Method
Links
Win Rate
DPO-Qwen3-8B
Stage=DPO
2026.05
37.1
GPT-4.1
2026.05
34.3
SFT-Qwen3-8B
Stage=SFT
2026.05
28.6
Qwen3-8B
Stage=Base
2026.05
0
Feedback
Search any
task
Search any
task