Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Preference Alignment on PKU-SafeRLHF (test)
Loading...
28.69
Win Rate
Qwen-1.7B-DPO
-0.1596
7.3302
14.82
22.3098
Dec 16, 2025
Win Rate
Tie Rate
Lost Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Win Rate
Tie Rate
Lost Rate
Qwen-1.7B-DPO
Judge=DeepSeek-V3 API
2025.12
28.69
53.95
17.35
Qwen-1.7B-SGRPO
Judge=DeepSeek-V3 API
2025.12
22.82
52.09
25.09
Qwen-1.7B-SFT
Judge=DeepSeek-V3 API
2025.12
0.95
2.41
96.62
Feedback
Search any
task
Search any
task