Share your thoughts, 1 month free Claude Pro on usSee more

Preference Alignment on HH and UF In-Domain (test)

68.4Win Rate

TPMM-DPO

Updated 8d ago

Evaluation Results

Method	Links
TPMM-DPO 2026.05		68.4	0.681
rDPO 2026.05		65.9	0.664
TPMM-DPO 2026.05		63.7	0.629
TPMM-DPO 2026.05		62.9	0.671
TPMM-DPO 2026.05		62.4	0.643
sDPO 2026.05		61.8	0.627
DPO 2026.05		61.5	0.621
DPO 2026.05		59.7	0.589
DPO 2026.05		56.8	0.613
SFT 2026.05		50	0.582