Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Helpful Assistant Alignment on Helpful Assistant normalized rewards (test)

53Helpfulness Reward (r1)

MOPO-Lag

19.7228.363745.64Dec 11, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
5351
2025.12
5253
2025.12
5152
2025.12
4948
2025.12
4947
2025.12
4948
2025.12
4846
2025.12
4847
2025.12
4850
2025.12
4543
2025.12
4547
2025.12
4548
2025.12
4553
2025.12
4547
2025.12
4441
2025.12
4445
2025.12
4341
2025.12
4346
2025.12
4341
2025.12
4344
2025.12
4240
2025.12
4250
2025.12
4139
2025.12
4147
2025.12
4139
2025.12
4146
2025.12
4033
2025.12
4042
2025.12
4039
2025.12
4042
2025.12
4040
2025.12
3937
2025.12
3837
2025.12
3841
2025.12
3736
2025.12
3740
2025.12
3745
2025.12
3639
2025.12
3635
2025.12
3639
2025.12
3533
2025.12
3533
2025.12
3544
2025.12
3540
2025.12
3438
2025.12
3435
2025.12
3332
2025.12
3233
2025.12
3230
2025.12
3130
2025.12
3127
2025.12
2926
2025.12
2824
2025.12
2826
2025.12
2724
2025.12
2725
2025.12
2524
2025.12
2321
2025.12
2219
2025.12
2120