| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Knowledge-focused evaluation | MixEval Hard | Accuracy21.8 | 8 | |
| Knowledge-focused evaluation | MixEval Standard | Accuracy33 | 8 | |
| Conversational | MixEval | Score76 | 6 | |
| Alignment | MixEval | Score86.7 | 5 | |
| Alignment | MixEval v1 (test) | Accuracy76.5 | 4 |