Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Model Transfer on Anthropic Harmless (AHar)

4.62AOG

ANCHOR

1.21922.10212.9853.8679May 25, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.05
4.62--4.62
2026.05
2.62--2.62
2026.05
2.44--2.44
2026.05
2.26--2.26
2026.05
1.53--1.02
2026.05
1.43--1.43
2026.05
1.4--1.4
2026.05
1.35--1.35
2026.05
-122.63.09-
2026.05
-134.53.39-
2026.05
-160.91.48-
2026.05
-160.91.48-