| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AI Barometer Estonian Chatbot Arena 19.02.2026 | Score1,457 | 20 | 1mo ago | ||
| MT-Bench High-Disagreement (Top 20%) | rDPO + DARC-ϵ | Human Score8.72 | 13 | 1mo ago | |
| MT-Bench Overall | rDPO | Human Score8.17 | 13 | 1mo ago | |
| Vicuna benchmark | GPT-4 | Elo Rating13,481 | 8 | 1mo ago | |
| ArenaHard v2 | Base | Hard Prompt Accuracy14 | 4 | 1mo ago |