Share your thoughts, 1 month free Claude Pro on usSee more

Helpfulness on Template T3 GPT-4 evaluation (test)

91.62Win Rate

SafeDPO

Updated 4mo ago

Evaluation Results

Method	Links
SafeDPO 2025.05		91.62	1.12	7.25
SafeRLHF 2025.05		67.5	8.75	23.75
DPO-HARMLESS 2025.05		65.88	16.75	17.38
DPO-SAFEBETTER 2025.05		64.25	28	7.75
DPO-HELPFUL 2025.05		46.62	35.25	18.12