| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TruthfulQA | Sparse MAD | Accuracy83.41 | 60 | 5d ago | |
| BIOS | PROXIMALByte | Factuality56 | 28 | 1mo ago | |
| MMLU | GPT-4 | EM82.4 | 16 | 1mo ago | |
| NQ-Swap | CoDA | Science Category Score43.7 | 12 | 1mo ago | |
| SelfAware | Base Model | Score0.372 | 10 | 1mo ago | |
| SimpleQA | Kimi-K2 | Factuality Score35.3 | 7 | 1mo ago | |
| TruthfulQA gen | FGD | BLEU Acc42 | 5 | 1mo ago | |
| TruthfulQA | Factuality Score55.6 | 4 | 1mo ago |