Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Alignment Reward Evaluation on HHA (test)

64Harmless Score

SFT+DPO

31.822440.176248.5356.8838Feb 16, 2025
Updated 26d ago

Evaluation Results

MethodLinks
2025.02
6469.4471.0171.5670.6769
2025.02
6467.9371.3270.8370.0268.52
2025.02
63.2468.1371.1370.7870.0168.32
2025.02
62.3869.167070.6169.9368.04
2025.02
61.2963.2766.0765.7565.0364.09
2025.02
60.1965.9668.9468.2767.7265.84
2025.02
58.4558.9462.9461.8961.2660.56
2025.02
57.9760.2364.9262.9262.6961.51
2025.02
57.1861.1365.5764.3363.6862.06
2025.02
56.4364.6564.9565.965.1662.98
2025.02
56.1759.8961.0260.8660.5959.48
2025.02
55.1959.960.6161.2660.5959.24
2025.02
53.7159.3860.0460.5559.9958.42
2025.02
52.4857.3560.5859.3859.157.44
2025.02
50.5250.0153.6852.2451.9851.61
2025.02
45.544.4547.0745.6145.7145.66
2025.02
38.7334.7439.9637.337.3337.68
2025.02
37.0320.5124.0421.9322.1625.88
2025.02
35.0526.531.1528.628.7530.33
2025.02
33.0627.3929.5328.3628.4329.59