Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmful score evaluation on BeaverTails (test)

28.4Harmful Score

SFT

-0.9286.68614.321.914Feb 5, 2026Feb 21, 2026Mar 9, 2026Mar 26, 2026Apr 11, 2026Apr 27, 2026May 14, 2026
Updated 19d ago

Evaluation Results

MethodLinks
2026.02
28.4
2026.02
28
2026.02
27.6
2026.02
27.6
2026.02
25.84
2026.02
25.8
2026.02
24.7
2026.02
24.4
2026.02
24.1
2026.02
23.1
2026.02
21
2026.02
20.6
2026.02
19.84
2026.02
18.5
2026.02
17.8
17.6
2026.02
16.86
2026.02
16.8
2026.02
16.3
2026.02
16.1
2026.02
15.96
2026.02
15.1
2026.02
14.8
2026.02
13.88
2026.02
9.7
2026.02
9.6
2026.02
9.4
2026.02
9.3
2026.02
8.94
2026.02
8.9
2026.02
8.9
2026.02
8.8
2026.02
8.42
2026.02
8.4
2026.02
8.3
2026.02
5.5
2026.05
0.738
2026.05
0.707
2026.05
0.707
2026.05
0.705
2026.05
0.705
2026.05
0.701
2026.05
0.693
2026.05
0.691
2026.05
0.5
2026.05
0.416
2026.05
0.375
2026.05
0.324
2026.05
0.307
2026.05
0.239
2026.05
0.229
2026.05
0.2