| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Factual Consistency Evaluation | TRUE benchmark | PAWS (AUC-ROC)98.4 | 37 | |
| Factual Consistency Evaluation | TRUE 1.0 (test) | Frank AUC91.5 | 20 | |
| Factual Grounding Evaluation | TRUE (sampled 100 entries from each of 11 datasets) | ROC-AUC0.86 | 3 |