Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Alignment on HH and UF In-Domain (test)
Loading...
68.4
Win Rate
TPMM-DPO
49.264
54.232
59.2
64.168
May 22, 2026
Win Rate
Gold Reward
Updated 8d ago
Evaluation Results
Method
Method
Links
Win Rate
Gold Reward
TPMM-DPO
Merging Strategy=Learn...
2026.05
68.4
0.681
rDPO
2026.05
65.9
0.664
TPMM-DPO
Merging Strategy=Simpl...
2026.05
63.7
0.629
TPMM-DPO
Merging Strategy=Learn...
2026.05
62.9
0.671
TPMM-DPO
Merging Strategy=Simpl...
2026.05
62.4
0.643
sDPO
2026.05
61.8
0.627
DPO
Iteration=2
2026.05
61.5
0.621
DPO
Iteration=3
2026.05
59.7
0.589
DPO
Iteration=1
2026.05
56.8
0.613
SFT
2026.05
50
0.582
Feedback
Search any
task
Search any
task