| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Open Question Answering | Natural Questions (NQ) (test) | Exact Match (EM)58.4 | 134 | |
| Over-refusal Evaluation | NQ (Natural Questions) | ORR0 | 72 | |
| Question Answering | Natural Questions (test) | EM61.65 | 72 | |
| Retrieval Attack Defense | Natural Questions (NQ) | ASR0 | 64 | |
| Retrieval | Natural Questions (test) | Top-5 Recall92.1 | 62 | |
| Question Answering | NQ (Natural Questions) (test) | Accuracy68.6 | 60 | |
| Question Answering | NQ (Natural Questions) | EM78.3 | 55 | |
| Question Answering | Natural Questions | EM70.58 | 52 | |
| Open Domain Question Answering | Natural Questions (NQ) | Exact Match (EM)51.4 | 46 | |
| Question Answering | Natural Questions (NQ) (test) | Robust Accuracy68 | 45 | |
| Passage retrieval | Natural Questions (NQ) (test) | Top-20 Accuracy85.2 | 45 | |
| Embedding Alignment | Natural Questions (test) | Top-1 Accuracy100 | 40 | |
| Question Answering | Natural Questions (NQ) | Accuracy49.3 | 36 | |
| Open-QA Evaluation | EVOUNA-NaturalQuestions | F1 Score97.9 | 35 | |
| Question Answering | Natural Questions (NQ) (test) | Exact Match60.4 | 35 | |
| Open-Domain Question Answering | NQ (Natural Questions) | EM51.4 | 33 | |
| Question Answering | NQ (Natural Questions) | EM42.5 | 28 | |
| Passage Retrieval | Natural Questions (NQ) | Top-10 Accuracy66.59 | 28 | |
| Closed-book Question Answering | Natural Questions (test) | Accuracy29.9 | 27 | |
| Information Retrieval | Natural Questions (test) | Recall@2086.1 | 25 | |
| Single-hop QA | NQ (Natural Questions) | EM72 | 22 | |
| Knowledge Evaluation | Natural Questions (NQ) (Evaluation) | Accuracy59.4 | 22 | |
| Extractive Question Answering | Natural Questions MRQA | F1 Score81 | 22 | |
| Question Answering | Natural Questions | Accuracy44.6 | 21 | |
| RAG Question Answering | NQ (Natural Questions) | F1 Score54.06 | 20 |