| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LLM Trustworthiness Benchmark | Mi:dm K 2.5 Pro (March ‘26) | Bias Score89.58 | 17 | 2mo ago | |
| DVDs | LIME | Average F196.6 | 16 | 3mo ago | |
| Books | LIME | Avg F196.7 | 16 | 3mo ago | |
| Trust-Memevo Tool-use Domain | TAME | No-Memory81.8 | 14 | 3mo ago | |
| Trust-Memevo Math Domain | Reasoningbank+Guard | No-Memory Score36.7 | 14 | 3mo ago | |
| Trust-Memevo Science Domain | Reasoningbank | No-Memory81.3 | 14 | 3mo ago | |
| AraTrust | LLaMA3-Tamed-70B | Accuracy63.41 | 8 | 3mo ago | |
| RagTruth | DeepSeek-V3.2 | Score93.92 | 5 | 8d ago | |
| TruthfulQA | TruthfulQA Score81.88 | 5 | 8d ago | ||
| Trustworthiness Average (human evaluation) | Sparse Activation Control | Control Win Rate0.88 | 2 | 3mo ago | |
| Adv Fact (human evaluation) | Sparse Activation Control | Control Wins68 | 1 | 3mo ago | |
| Privacy (human evaluation) | Sparse Activation Control | Control Wins100 | 1 | 3mo ago | |
| Robust (human evaluation) | Sparse Activation Control | Control Wins100 | 1 | 3mo ago | |
| Exag safety (human evaluation) | Sparse Activation Control | Control Wins68 | 1 | 3mo ago |