| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Pairwise Preference Comparisons 1.5B Scale (test) | STACKELBERGGDA FOLLOWER | Avg Preference Score0.834 | 30 | 3mo ago | |
| Qwen2.5-3B responses (test) | NASH-MD-PG | Avg Preference Score82.7 | 30 | 3mo ago | |
| 150 prompt-response pairs | RL-trained policy | Win Rate63.3333 | 9 | 5d ago | |
| HH-RLHF held-out (test) | DP-RLHF | Win Rate53.02 | 6 | 2mo ago |