| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TruthfulQA | Sparse MAD | Accuracy83.41 | 97 | 5d ago | |
| BIOS | PROXIMALByte | Factuality56 | 28 | 3mo ago | |
| MMLU | GPT-4 | EM82.4 | 16 | 3mo ago | |
| NQ-Swap | CoDA | Science Category Score43.7 | 12 | 3mo ago | |
| SelfAware | Base Model | Score0.372 | 10 | 3mo ago | |
| SimpleVQA | LoMo | Factuality Score43.51 | 8 | 5d ago | |
| SimpleQA | Kimi-K2 | Factuality Score35.3 | 7 | 3mo ago | |
| TruthfulQA gen | FGD | BLEU Acc42 | 5 | 2mo ago | |
| TruthfulQA | Factuality Score55.6 | 4 | 2mo ago |