| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LS1 -> LS2 (test) | PCogAlign | RSA4.032 | 13 | 3mo ago | |
| Assistant and Summary personalization tasks (test) | vol-mo | Win Rate83.91 | 12 | 3mo ago | |
| StereoSet Explicit Preference (test) | TriAlign | Preference Score79.1 | 8 | 1d ago | |
| StereoSet Implicit Preference (test) | TriAlign | Pref Score0.474 | 8 | 1d ago | |
| AIME Explicit Preference 2025 (test) | TriAlign | Pref69.9 | 8 | 1d ago | |
| AIME Implicit Preference 2025 (test) | TriAlign | Preference Score0.221 | 8 | 1d ago | |
| Review Interpolated Users | FSPO + RAT | Winrate84.6 | 8 | 1mo ago | |
| Review Trained Users | FSPO + RAT | Winrate92.3 | 8 | 1mo ago | |
| Real-world failure cases from large-scale commercial PA | RP-Reasoner | Macro Accuracy73.4 | 4 | 3mo ago | |
| RPEVAL | RP-Reasoner | Macro Accuracy24 | 4 | 3mo ago |