| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Hallucination detection | NQ-Open | AUROC0.8843 | 63 | |
| Question Answering | NQ-Open (val) | Accuracy49.62 | 46 | |
| Question Answering | NQ-Open In-Domain (test) | Precision58.13 | 26 | |
| Factual Question Answering | NQ-Open ID | Precision57.34 | 24 | |
| Question Answering | NQ-open Augmented (full-slice) | Restate-hard85.42 | 18 | |
| Question Answering | NQ-open v1.0 (test) | A179.08 | 16 | |
| Question Answering | NQ-Open Out-of-distribution (test) | Accuracy49.93 | 15 | |
| Hallucination Detection | NQ Open (test) | AUROC89.4 | 14 | |
| Question Answering | NQ-Open (out-of-domain) | Precision0.705 | 12 | |
| Question Answering | NQ-Open (test) | Mean F1 Score25.61 | 10 | |
| Open-domain Question Answering | NQ-Open OOD (test) | Exact Match (EM)82.81 | 9 |