| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | TyDi QA | Accuracy69.4 | 43 | |
| Context Attribution | TyDi QA random subset of 10,000 samples | Log-Probability Drop0.893 | 12 | |
| Attribution Quality Evaluation | TyDi QA | Log-Prob Drop0.107 | 12 | |
| Question Answering | TyDi QA No-context | F1 (Arabic)42.6 | 4 | |
| Question Answering | TyDi QA Gold Passage | Arabic F173.8 | 4 | |
| Question Answering | TyDi QA less-mix (test) | F127.4 | 3 |