| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Truthful QA | Truthful QA | Accuracy68.4 | 83 | |
| Question Answering | Truthful-QA | Info Accuracy99.2 | 27 | |
| Personalization | Truthful QA | Creative Score (ArmoRM)56 | 18 | |
| Hallucination Detection | Truthful-QA | Accuracy74.17 | 17 | |
| Test-Time Personalization | Truthful QA | Creative Win Rate99.6 | 15 | |
| CoT faithfulness detection | Truthful-QA | Accuracy78 | 12 | |
| Question Answering | Truthful QA | LIS3.1838 | 10 |