| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Chatbot Arena latest (test) | SynthesizeMe (+ FT RM + Personas) | Accuracy72.18 | 51 | 3mo ago | |
| HPD v2 | Accuracy85.4 | 25 | 8d ago | ||
| HPD v3 | Gemini 3.1 Pro + ARR | Accuracy78.3 | 21 | 22d ago | |
| internal benchmark | Qwen-7B | Accuracy82.2 | 14 | 1mo ago | |
| DocPairBench | DocReward-7B | Gov. Preference Score89.3 | 12 | 15d ago | |
| VisForm | CMMS | Accuracy66.7 | 8 | 2mo ago | |
| AGIQA | CMMS | Accuracy71.5 | 8 | 2mo ago | |
| MHP dataset | MPS | Overall74.24 | 7 | 3mo ago | |
| Pick-a-Pic (test) | PickScore | Accuracy70.5 | 7 | 3mo ago | |
| VideoDPO | AesRM-Base | Accuracy (w/ Ties)58.46 | 6 | 1mo ago | |
| Human Preference 22-joint | MotionReward | Accuracy86.09 | 6 | 2mo ago | |
| Proprietary (test) | Image Score | OPA67.75 | 5 | 1mo ago | |
| Human Preference 300 motion descriptions (test) | HuDA | Accuracy77.4 | 4 | 3mo ago | |
| ImageReward 371 prompts (test) | ImageReward | Recall @139.62 | 4 | 3mo ago | |
| ImageReward 466 prompts (test) | ImageReward | Preference Accuracy65.14 | 4 | 3mo ago |