| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GPQA full dataset | Meta-Debate | Accuracy66.29 | 20 | 2mo ago | |
| Hellaswag bo | Ours-MoE-SFT | Accuracy39.16 | 17 | 20d ago | |
| GPQA (test) | RouteGoT | Accuracy65.7 | 11 | 2mo ago | |
| MMLU | FineWeb-Edu | Accuracy32.8 | 10 | 1mo ago | |
| Date Understanding (test) | RIOT | Accuracy78.2 | 8 | 3mo ago | |
| Arc-bo | Ours-SFT | Accuracy48.39 | 6 | 20d ago | |
| FlameBench | Accuracy32.64 | 4 | 2mo ago | ||
| Agricultural Benchmark Speech + Image + Text 1.0 (test) | AgriGPT-Omni | Acc (CN)78 | 4 | 3mo ago |