| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TruthfulQA | Accuracy75 | 108 | 1mo ago | ||
| TruthfulQA | A-LQR | T·I Score84.7 | 59 | 1mo ago | |
| TruthfulQA | SG | BLEU-Acc54.7 | 35 | 20h ago | |
| TruthfulQA | Aligner | Reliability Score16.9 | 33 | 1mo ago | |
| TruthfulQA (test) | PromptCD | MC154.95 | 30 | 4d ago | |
| TruthfulQA | DFT | Avg@k66.63 | 27 | 21d ago | |
| TruthfulQA medical (test) | BioMistral 7B TIES | Health Score83.6 | 22 | 3mo ago | |
| TruthfulQA | LoPT-GRPO | Accuracy65.75 | 20 | 4d ago | |
| TruthfulQA | Council Mode | TruthfulQA Score82.6 | 12 | 1mo ago | |
| TruthfulQA | STM | TruthfulQA Delta12 | 10 | 13d ago | |
| TruthfulQA | IPO | Normalized Accuracy58.76 | 10 | 2mo ago | |
| TruthfulQA | PASf | Mean Improvement0.33 | 9 | 15d ago | |
| TruthfulQA | EvoPref-Best | TQA Score54.9 | 9 | 21d ago | |
| TruthfulQA | Llama3.1-8B-Instruct | Average Score (@8)68.69 | 8 | 1mo ago | |
| TruthfulQA generation | QWEN 2.5 7B-I | Exclusive Catch Rate (@10%)7.9 | 3 | 1mo ago | |
| TruthfulQA | Reporting-and-penalty mechanism | Accuracy under Attack100 | 2 | 1mo ago |