| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reasoning | Humanity's Last Exam | Accuracy84.61 | 46 | |
| Question Answering | Humanity's Last Exam | Pass@151.7 | 16 | |
| Expert-Level Question Answering | Humanity's Last Exam | Accuracy40.9 | 14 | |
| Question Answering | Humanity's Last Exam (HLE) MCQ | Accuracy19.9 | 6 | |
| Long Context Evaluation | Humanity's Last Exam AA-LCR | Accuracy54.3 | 6 | |
| World Knowledge | HUMANITY’S LAST EXAM text-only | Score11.1 | 4 |