| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reward Modeling | RMB | Accuracy89.3 | 120 | |
| Reward Modeling | RMB (test) | Score89.3 | 22 | |
| Preference Evaluation | RMB Best-of-N | Helpfulness Score (BoN)86.2 | 16 | |
| Reward Modeling | RMB | Help Accuracy88.6 | 13 | |
| Best-of-N evaluation | RMB | Accuracy59.69 | 2 |