| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Helpful Assistant Alignment | Helpful Assistant normalized rewards (test) | Helpfulness Reward (r1)53 | 60 | |
| Multi-Objective Optimization | Helpful Assistant Harmless-helpful | MPD1.015 | 4 | |
| Multi-Objective Optimization | Helpful Assistant Humor-helpful | MPD1.439 | 4 |