| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Fact-checking | ExpertQA | Balanced Accuracy60.3 | 15 | |
| Attributable Text Generation | ExpertQA v1 (test) | AutoAIS0.6612 | 9 | |
| General Question Answering | ExpertQA | Reward0.2385 | 8 | |
| Question Answering | EXPERTQA (test) | Claim Recall19.27 | 6 | |
| Retrieval-Augmented Generation | ExpertQA | Faithfulness73.9 | 5 | |
| Medical Long-form Answering | ExpertQA Biomed | Relevance3.7 | 4 | |
| URL Health and Self-Correction | ExpertQA | Total URLs7,985 | 3 |