Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Alignment on HH and UF Out-of-Domain (test)
Loading...
58.7
OOD Win Rate
TPMM-DPO
49.652
52.001
54.35
56.699
May 22, 2026
OOD Win Rate
OOD Gold Reward
Updated 8d ago
Evaluation Results
Method
Method
Links
OOD Win Rate
OOD Gold Reward
TPMM-DPO
Merging Strategy=Learn...
2026.05
58.7
0.287
rDPO
2026.05
57.9
0.271
TPMM-DPO
Merging Strategy=Learn...
2026.05
57.6
0.242
DPO
Iteration=2
2026.05
57.4
0.238
TPMM-DPO
Merging Strategy=Simpl...
2026.05
57.2
0.266
TPMM-DPO
Merging Strategy=Simpl...
2026.05
57
0.234
sDPO
2026.05
55.3
0.226
DPO
Iteration=1
2026.05
54.2
0.214
DPO
Iteration=3
2026.05
51.1
0.201
SFT
2026.05
50
0.105
Feedback
Search any
task
Search any
task