| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| PRISM Personalized | P-GenRM | Accuracy68.06 | 44 | 4d ago | |
| Chatbot Arena Personalized | P-GenRM | Accuracy75.92 | 42 | 4d ago | |
| BESPOKE-Meta OOD | P-CHECK | Binary Preference Accuracy75.48 | 18 | 4d ago | |
| Reddit TLDR 150 examples Overall | MRM | User-level Accuracy69.7 | 11 | 4d ago | |
| Reddit TLDR 150 examples Unseen | MRM | User-level Accuracy69.8 | 11 | 4d ago | |
| Reddit TLDR 150 examples Seen | MRM | User-level Accuracy69.7 | 11 | 4d ago | |
| Reddit TLDR 100 examples Overall | MRM | User-level Accuracy69.6 | 11 | 4d ago | |
| Reddit TLDR 100 examples Unseen | MRM | User-level Accuracy69.6 | 11 | 4d ago | |
| Reddit TLDR 100 examples Seen | MRM | User-level Accuracy69.6 | 11 | 4d ago | |
| PRISM Overall | MRM | User-level Accuracy65.3 | 11 | 4d ago | |
| PRISM Unseen | MRM | User-level Accuracy0.652 | 11 | 4d ago | |
| PRISM Seen | MRM | User-level Accuracy65.3 | 11 | 4d ago | |
| Lamp-QA (OOD) | Arts Score60 | 7 | 4d ago |