| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | General QA NQ, TriviaQA, PopQA (test) | Overall Average Score51.3 | 49 | |
| General Question Answering | General QA NQ, TriviaQA, PopQA | NQ Accuracy51.8 | 34 | |
| Complexity prediction | GENERAL QA MMLU+MMLU-PRO+GSM8K | ROC-AUC89.1 | 3 |