Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Alignment Reward Evaluation on OffsetBias (test)

67.94Reward

SFT+DPO

19.122431.796244.4757.1438Feb 16, 2025
Updated 26d ago

Evaluation Results

MethodLinks
2025.02
67.94
2025.02
66.44
2025.02
66.06
2025.02
65.88
2025.02
65.81
2025.02
65.75
2025.02
65.63
2025.02
65.63
2025.02
65.38
2025.02
65.38
2025.02
64.75
2025.02
63.91
2025.02
63.81
2025.02
63.72
2025.02
62.94
2025.02
62.59
2025.02
61.97
2025.02
61.44
2025.02
60.88
2025.02
60.88
2025.02
60.63
2025.02
59.5
2025.02
59.09
2025.02
59.09
2025.02
59.09
2025.02
58.97
2025.02
57.69
2025.02
57.63
2025.02
56.91
2025.02
56.81
2025.02
56.63
2025.02
55.88
2025.02
54.09
2025.02
53.94
2025.02
53.44
2025.02
52.63
2025.02
51.75
2025.02
51.38
2025.02
45.16
2025.02
45.06
2025.02
40.38
2025.02
38.97
2025.02
37.31
2025.02
32.38
2025.02
30.42
2025.02
30.23
2025.02
27.05
2025.02
26.61
2025.02
26.52
2025.02
21