| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | NQ | Accuracy66.6 | 108 | |
| Hallucination Detection | NQ | AUC0.8645 | 102 | |
| Hallucination Detection | NQ (test) | AUC ROC95.2 | 84 | |
| Question Answering | NQ (test) | EM Accuracy58.4 | 66 | |
| Question Answering | NQ | EM79 | 57 | |
| Calibration | NQ | ECE0.046 | 55 | |
| Retrieval-Augmented Generation (RAG) | NQ | Reliability Score (RS)54.33 | 52 | |
| Table Question Answering | NQ-Table | F1 Score80.1 | 50 | |
| End-to-end Open-Domain Question Answering | NQ (test) | Exact Match (EM)54 | 50 | |
| General Question Answering | NQ | Exact Match (EM)54.8 | 36 | |
| Information Retrieval | NQ320k | Hits@140.4 | 32 | |
| Question Answering | NQ | Accuracy38 | 30 | |
| Passage Ranking | NQ | MRR52.76 | 29 | |
| Question Answering | NQ | EM39.5 | 28 | |
| Prompt Injection Prevention | NQ simplified | Naïve Success Rate41 | 24 | |
| Confidence Calibration in Retrieval-Augmented Generation | NQ k=5 OOD (test) | ECE0.248 | 24 | |
| Question Answering | NQ-Open | Exact Match (EM)47.4 | 24 | |
| Retrieval-Augmented Generation | NQ | Accuracy77.1 | 23 | |
| Document Retrieval | NQ 320k (test) | Hits@163.4 | 23 | |
| Document Retrieval | NQ 100K | Hits@127.5 | 23 | |
| Document Retrieval | NQ10K | Hits@148.5 | 23 | |
| Open-domain Question Answering | NQ (test) | EM44.38 | 22 | |
| Question Answering | NQ | Faith0.9083 | 21 | |
| Open-domain Question Answering | NQ-Open | Accuracy29 | 20 | |
| Information Retrieval | NQ BEIR | nDCG@1062.76 | 20 |