Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model Alignment on HH-RLHF D3 (test)

32.77Harmlessness BLEU Score

DEFT-PRO

28.880429.890230.931.9098Apr 2, 2026
Updated 15d ago

Evaluation Results

MethodLinks
2026.04
32.773.7973.7934.663.6571.2434.153.6971.93
2026.04
32.033.9571.4536.774.1673.1235.494.172.67
2026.04
31.763.8672.4834.913.8468.5434.063.8569.6
2026.04
29.43.5672.9533.53.6468.4933.53.6269.69
2026.04
29.033.8874.2334.794.0469.2733.23470.61