| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Fact-checking | ExpertQA | Balanced Accuracy60.3 | 15 | |
| Attributable Text Generation | ExpertQA v1 (test) | AutoAIS0.6612 | 9 | |
| Question Answering | EXPERTQA (test) | Claim Recall19.27 | 6 | |
| Retrieval-Augmented Generation | ExpertQA | Faithfulness73.9 | 5 | |
| Medical Long-form Answering | ExpertQA Biomed | Relevance3.7 | 4 |