| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Factuality-based Question Answering | FreshQA 2025/11/24 | C44 | 40 | |
| Question Answering | FreshQA (out-of-domain) | Precision67.2 | 12 | |
| Question Answering | FreshQA (train test) | BLEU28.37 | 4 | |
| Question Answering | FreshQA | EM26.6 | 3 | |
| Temporal Question Answering | FreshQA | AUROC0.657 | 2 | |
| Factual Reasoning | FreshQA v2 | Baseline Wins16 | 2 |