| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| BIOS | PROXIMALByte | Factuality56 | 28 | 4d ago | |
| TruthfulQA | GANPO | Accuracy55.67 | 18 | 4d ago | |
| MMLU | GPT-4 | EM82.4 | 16 | 4d ago | |
| NQ-Swap | CoDA | Science Category Score43.7 | 12 | 3d ago | |
| SelfAware | Base Model | Score0.372 | 10 | 4d ago | |
| SimpleQA | Kimi-K2 | Factuality Score35.3 | 7 | 4d ago |