Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model Alignment on HH-RLHF D2 (test)

20.13Harmlessness BLEU

DEFT-DPO

7.223610.574313.92517.2757Apr 2, 2026
Updated 15d ago

Evaluation Results

MethodLinks
2026.04
20.132.8765.3530.083.1560.2127.393.0761.6
2026.04
17.042.2559.5128.42.6957.0525.332.5657.72
2026.04
8.541.7762.2122.582.758.4318.782.4559.45
2026.04
7.791.7760.8919.461.9950.6516.31.9353.42
2026.04
7.721.7561.320.272.0653.0716.871.9855.29