| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LLaVA-Bench Wild | Score102 | 52 | 3d ago | ||
| LLaVA Bench | GPT-4V (2023.11.06) | LLaVA Bench Score93.1 | 21 | 3d ago | |
| PCogAlignBench (LS2) | GRPO (Latent Action) | LLM Judge Score0.852 | 20 | 3d ago | |
| PCogAlignBench LS1 | DAPO (Latent Action) | LLM Judge Score0.903 | 20 | 3d ago | |
| MMRole OOD | Dr.GRPO (Latent Action) | LLM-as-a-Judge Score91.6 | 20 | 3d ago | |
| MMRole (ID) | Dr.GRPO (Latent Action) | LLM-as-a-Judge Score95.3 | 20 | 3d ago | |
| Multimodality Chatbot Arena | LLaMA-Adapter v2 | Elo Rating1,023 | 8 | 3d ago | |
| LLaVA-Bench In-the-Wild v1 (test) | ShareGPT4V-7B | General Score0.726 | 7 | 3d ago |