| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LLM Trustworthiness Benchmark | Mi:dm K 2.5 Pro (March ‘26) | Bias Score89.58 | 17 | 26d ago | |
| DVDs | LIME | Average F196.6 | 16 | 1mo ago | |
| Books | LIME | Avg F196.7 | 16 | 1mo ago | |
| Trust-Memevo Tool-use Domain | TAME | No-Memory81.8 | 14 | 1mo ago | |
| Trust-Memevo Math Domain | Reasoningbank+Guard | No-Memory Score36.7 | 14 | 1mo ago | |
| Trust-Memevo Science Domain | Reasoningbank | No-Memory81.3 | 14 | 1mo ago | |
| AraTrust | LLaMA3-Tamed-70B | Accuracy63.41 | 8 | 1mo ago | |
| Trustworthiness Average (human evaluation) | Sparse Activation Control | Control Win Rate0.88 | 2 | 1mo ago | |
| Adv Fact (human evaluation) | Sparse Activation Control | Control Wins68 | 1 | 1mo ago | |
| Privacy (human evaluation) | Sparse Activation Control | Control Wins100 | 1 | 1mo ago | |
| Robust (human evaluation) | Sparse Activation Control | Control Wins100 | 1 | 1mo ago | |
| Exag safety (human evaluation) | Sparse Activation Control | Control Wins68 | 1 | 1mo ago |