| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AgentDojo | GLM-4.5 | Utility78.4 | 32 | 4d ago | |
| SLIMORCA (test) | TOSS-Pro | Score68.85 | 24 | 1mo ago | |
| Just-Eval | Just-Eval Average Score4.83 | 18 | 1mo ago | ||
| NQ-Open | CNT | Delta NQ-Open5.13 | 17 | 1mo ago | |
| MMLU | CNT | ΔMMLU0.2 | 17 | 1mo ago | |
| Anchor Utility Dataset | CDA | Anchor-PPL5.24 | 16 | 1mo ago | |
| GM | TVAE | Balanced Acc66.6 | 13 | 1mo ago | |
| CR | Balanced Acc68.6 | 13 | 1mo ago | ||
| CC | DP-CTGAN | Balanced Acc67.3 | 13 | 1mo ago | |
| BM | TVAE | Balanced Acc60.3 | 13 | 1mo ago | |
| AD | Balanced Accuracy81.8 | 13 | 1mo ago | ||
| ScienceQA (S-QA) | CMRM_dataset | Accuracy73.2 | 13 | 1mo ago | |
| LLaVA-Bench Coco | ShareGPT4V | Score92.3 | 13 | 1mo ago | |
| Downstream Tasks | DAPT (nontoxic) | Average Accuracy63.4 | 12 | 1mo ago | |
| BC | Balanced Acc72.1 | 11 | 1mo ago | ||
| MMbench and DocVQA (test) | MMbench Score87.02 | 7 | 1mo ago | ||
| XSTest Safe Prompts | FedDPO | Compliance97.2 | 3 | 9d ago | |
| IQ Dataset | - | - | 0 | 1mo ago |