| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HPD v2 (test) | HPSv3 | Preference Accuracy85.36 | 18 | 4d ago | |
| ImageReward (test) | MPS | Preference Accuracy0.675 | 18 | 4d ago | |
| ImageReward | SparVAR | Average Score1.0533 | 16 | 4d ago | |
| HPS v2.1 | Infinity-8B | Photo Score29.49 | 16 | 4d ago | |
| MHP Overall (test) | MPS | Preference Accuracy74.2 | 7 | 4d ago | |
| VideoPhy 1.0 (test) | MAGI-1 | Physics Plausibility Win Rate59.3 | 4 | 4d ago | |
| PhysicsIQ 1.0 (test) | MAGI-1 | Physics Plausibility Win Rate54.9 | 4 | 4d ago | |
| Arena Creative Writing | JS | Win Rate23.4 | 3 | 4d ago | |
| Arena-Hard v0.1 | JS | Win Rate56.7 | 3 | 4d ago | |
| User Study (Group 2) | UAV-GPT | Category CT Score24 | 3 | 4d ago | |
| EgoDex and DreamDojo-HV novel out-of-distribution (eval) | DreamDojo-14B | Physics Correctness73.5 | 3 | 4d ago | |
| Human Preference Evaluation 371 prompts (test) | HPS | Recall @139.89 | 3 | 4d ago | |
| Human Preference Evaluation 466 prompts (test) | ImageReward | Preference Accuracy65.14 | 3 | 4d ago | |
| GenBlemish-27K (test) | Agentic Retoucher | Preference Share: Significantly Better48.8 | 2 | 4d ago | |
| Driver Attention Dataset Study 2 | Preference Rate59 | 2 | 4d ago | ||
| Driver Attention Dataset Study 1 | Preference Rate71 | 2 | 4d ago | ||
| RichHF-18K (test) | Finetuned Muse | Preference Rate (≫)0.215 | 1 | 4d ago |