Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reward Hacking Mitigation on Synthetic Goodhart 1.0 (Evaluation)

4.38R_g

IR3 Method B (Adversarial)

3.48563.71783.954.1822Feb 23, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
4.380.41
2026.02
4.350.45
2026.02
4.210.62
2026.02
4.120.71
2026.02
3.951.08
2026.02
3.851.32
2026.02
3.781.42
2026.02
3.721.48
2026.02
3.681.55
3.521.86