| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Manual Evaluation Set | MSelf-MOA | Average Helpfulness Score4.57 | 24 | 1mo ago | |
| LLaVA-Bench | LLaVA-RLHF | Conversation Score93.1 | 11 | 1mo ago | |
| MM-Vet2 (test) | GPT-Eval Score54.4 | 10 | 16d ago | ||
| MTBench | GPT-4o | Helpfulness9.35 | 8 | 1mo ago | |
| HH-RLHF helpful (test) | DeAL | Helpfulness Fraction77 | 7 | 1mo ago | |
| Helpfulness (evaluation set) | SafeDPO | Win Rate84.05 | 5 | 1mo ago | |
| HHH (test) | DPO + OGPSA | HHH Score90.68 | 3 | 1mo ago |