| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AI Barometer Estonian Chatbot Arena 19.02.2026 | Score1,457 | 20 | 3mo ago | ||
| MT-Bench High-Disagreement (Top 20%) | rDPO + DARC-ϵ | Human Score8.72 | 13 | 2mo ago | |
| MT-Bench Overall | rDPO | Human Score8.17 | 13 | 2mo ago | |
| Vicuna benchmark | GPT-4 | Elo Rating13,481 | 8 | 3mo ago | |
| WildBench | Overall Score71.64 | 6 | 15d ago | ||
| MT-Bench | Qwen3-32B + RLBFF training | Score (GPT-4-Turbo)9.5 | 6 | 15d ago | |
| ArenaHard v2 | DeepSeek R1 | ArenaHard v2 Score57.4 | 6 | 15d ago | |
| ArenaHard | AttentionPO | Win Rate13.88 | 3 | 12d ago |