| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Hallucination Detection | TriviaQA | AUROC0.98 | 621 | |
| Hallucination Detection | TriviaQA (test) | AUC-ROC98.3 | 243 | |
| Question Answering | TriviaQA | Accuracy86.68 | 238 | |
| Predicting Answer Correctness | TriviaQA (val) | AUROC0.904 | 199 | |
| Question Answering | TriviaQA | EM86.1 | 182 | |
| Selective Prediction | TriviaQA (val) | PRR87.9 | 175 | |
| Single-hop Question Answering | TriviaQA | EM72 | 133 | |
| Question Answering | TriviaQA (test) | Accuracy85.18 | 121 | |
| Question Answering | TriviaQA | Accuracy94.5 | 117 | |
| Correctness Prediction | TriviaQA | AUROC0.999 | 113 | |
| Uncertainty Estimation | TriviaQA | AUROC88 | 111 | |
| Uncertainty Estimation | TriviaQA (test) | AUROC87.91 | 110 | |
| Open-domain Question Answering | TriviaQA | EM76.1 | 88 | |
| RAG Performance Prediction | TriviaQA | QE5 Score0.889 | 80 | |
| Question Answering | TriviaQA (test) | EM92.1 | 80 | |
| Open-Domain Question Answering | TriviaQA (test) | Exact Match72.6 | 80 | |
| Question Answering | TriviaQA | EM62 | 71 | |
| Passage retrieval | TriviaQA (test) | Top-100 Acc90.1 | 67 | |
| Selective Generation | TriviaQA | ROC-AUC87.4 | 66 | |
| Question Answering | TriviaQA | BS (%)92.42 | 65 | |
| Question Answering | TriviaQA | ACC75 | 62 | |
| Open-domain Question Answering | TriviaQA open (test) | EM73.3 | 59 | |
| Question Answering | TriviaQA (TQA) | EM71.1 | 56 | |
| General Question Answering | TriviaQA | Exact Match69.02 | 54 | |
| Retrieval-Augmented Generation (RAG) | TriviaQA | Reliability Score80.67 | 52 |