| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | RADAR | MMLU Accuracy83.66 | 127 | 8d ago | |
| MMMLU | DOS-CPT | MMMLU General Knowledge Accuracy82.25 | 29 | 3mo ago | |
| CEVAL | Accuracy85.52 | 18 | 19d ago | ||
| C-Eval (val) | LLaMA2-13B | Accuracy34.32 | 15 | 1mo ago | |
| HAERAE | Llama3.3-70B | Accuracy70.9 | 13 | 20d ago | |
| C-Eval (test) | Qwen-14B | Accuracy71.8 | 13 | 3mo ago | |
| General-purpose benchmarks average (test) | Qwen3 8B | Accuracy73.8 | 12 | 3mo ago | |
| MMLU non-IID distribution, alpha=0.1 | FedAlign-MoE | Accuracy39.79 | 10 | 2mo ago | |
| MMLU Computer Security | NPO+KL w/ RNA | Accuracy46 | 8 | 1mo ago | |
| MMLU Corporate Biology | RMU w/ RNA | Accuracy60.4 | 8 | 1mo ago | |
| MMLU Perturbed | NPO+KL w/ RNA | Accuracy53.5 | 8 | 1mo ago | |
| KMMLU | Llama3.3-70B | Accuracy55.23 | 5 | 1mo ago | |
| General Knowledge Evaluation Suite (ARC, HellaSwag, LAMBADA, PIQA, SciQ, WinoGrande, TriviaQA, WebQS, MMLU, GSM8K) | SPLA | ARC-C60.2 | 5 | 3mo ago |