| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | TruthfulQA | Accuracy86.6 | 152 | |
| Hallucination Detection | TruthfulQA (test) | AUC-ROC89.5 | 105 | |
| Truthfulness Evaluation | TruthfulQA | Accuracy70.8 | 103 | |
| Hallucination Detection | TruthfulQA | AUC (ROC)0.9417 | 102 | |
| Truthfulness | TruthfulQA | Truthfulness Accuracy97.55 | 86 | |
| Multiple-Choice | TruthfulQA | MC1 Accuracy58.5 | 83 | |
| Question Answering | TruthfulQA | Accuracy86.64 | 73 | |
| Factuality Evaluation | TruthfulQA | MC294.3 | 73 | |
| Question Answering | TruthfulQA | TruthfulQA Score63 | 61 | |
| Factuality | TruthfulQA | Accuracy83.41 | 60 | |
| Question Answering | TruthfulQA MC1 | MC1 Accuracy88.8 | 54 | |
| Question Answering | TruthfulQA | Performance Score81.1 | 52 | |
| Machine-Generated Text Detection | TruthfulQA | TPR@FPR-1%94.85 | 48 | |
| Truthful Question Answering | TruthfulQA MC2 | MC2 Accuracy56.46 | 46 | |
| Hallucination | TruthfulQA | Score75.76 | 42 | |
| Question Answering | TruthfulQA | Truthful*Inf Score88.23 | 42 | |
| Open ended generation | TruthfulQA Without Rejected Samples open-ended (full) | Truthfulness74.67 | 39 | |
| Open ended generation | TruthfulQA With All Samples open-ended (full) | Truthfulness82.75 | 39 | |
| Multiple-Choice Question Answering | TruthfulQA MC1 | MC1 Accuracy76.2 | 39 | |
| Question Answering | TruthfulQA | MC149.93 | 35 | |
| Hallucination Detection | TruthfulQA | AUROC0.8851 | 33 | |
| Truthfulness Evaluation | TruthfulQA | Reliability Score16.9 | 33 | |
| Truthfulness | TruthfulQA | Reward-1.8 | 32 | |
| Correctness detection | TruthfulQA | AUC0.97 | 30 | |
| Truthfulness Evaluation | TruthfulQA (test) | MC154.95 | 30 |