| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| InstructionFollow | Accuracy63.2 | 32 | 22d ago | ||
| Manual Evaluation Set | MSelf-MOA | Average Helpfulness Score4.57 | 24 | 2mo ago | |
| MTBench | GPT-4o | Helpfulness9.35 | 18 | 15d ago | |
| Figstep-audio Harmful-Safe | SARSteer | BRR88.8 | 15 | 26d ago | |
| AdvBench-audio Harmful-Safe | SARSteer | BRR Score86.83 | 12 | 26d ago | |
| LLaVA-Bench | LLaVA-RLHF | Conversation Score93.1 | 11 | 3mo ago | |
| MM-Vet2 (test) | GPT-Eval Score54.4 | 10 | 2mo ago | ||
| HH-RLHF helpful (test) | DeAL | Helpfulness Fraction77 | 7 | 3mo ago | |
| Helpfulness (evaluation set) | SafeDPO | Win Rate84.05 | 5 | 3mo ago | |
| Pause-and-think B | pause-and-think (Ours) | Conciseness80.6 | 3 | 1d ago | |
| HHH (test) | DPO + OGPSA | HHH Score90.68 | 3 | 3mo ago |