| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Preference Prediction | PRISM (test) | Accuracy66.62 | 51 | |
| Personalization | PRISM | Personalization Win Rate81.62 | 45 | |
| Personalized Reward Modeling | PRISM Personalized | Accuracy68.06 | 44 | |
| Cultural Alignment | Prism | Rating4.627 | 24 | |
| LLM-as-a-judge | PRISM | Accuracy59.38 | 20 | |
| Preference Alignment | PRISM | Win-Rate (DPO)74.5 | 20 | |
| text-to-image generation | PRISM | Alignment Score87.1 | 14 | |
| LLM as a Judge | PRISM (test) | Accuracy58.9 | 14 | |
| Emotion and Micro-expression Analysis | PRISM | Macro-expression Accuracy80.2 | 13 | |
| Phone Recognition | PRiSM Multilingual Datasets | PFER (DRC)16.8 | 12 | |
| Phone Recognition | PRiSM Accented English Datasets | PFER (Timing)8.3 | 12 | |
| Personalized Reward Modeling | PRISM Overall | User-level Accuracy65.3 | 11 | |
| Personalized Reward Modeling | PRISM Unseen | User-level Accuracy0.652 | 11 | |
| Personalized Reward Modeling | PRISM Seen | User-level Accuracy65.3 | 11 | |
| Pluralistic Reward Model Learning | PRISM | Accuracy59.6 | 10 | |
| Preference Alignment Evaluation | PRISM (test) | BT Score (Mean)0.331 | 10 | |
| User Simulation Intrinsic Evaluation | PRISM | First-Turn Diversity94.55 | 8 | |
| Population property estimation | PRISM | Bias (MAE)0.46 | 8 | |
| Reward Modeling | PRISM Overall | Accuracy61.8 | 7 | |
| Reward Modeling | PRISM Unseen | Accuracy61.6 | 7 | |
| Reward Modeling | PRISM Seen | Accuracy62.1 | 7 | |
| Full-body motion estimation | PRISM (test) | PA-MPJPE31.29 | 6 | |
| Systems Optimization | PRISM | Final Score26.26 | 5 | |
| Model Selection Evaluation | PRISM | Actual Score (per type)93.2 | 5 | |
| Preference Alignment | PRISM 1.0 (test) | Borda Average2.393 | 5 |