| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GPQA full dataset | Meta-Debate | Accuracy66.29 | 20 | 24d ago | |
| GPQA (test) | RouteGoT | Accuracy65.7 | 11 | 1mo ago | |
| Date Understanding (test) | RIOT | Accuracy78.2 | 8 | 1mo ago | |
| FlameBench | Accuracy32.64 | 4 | 26d ago | ||
| Agricultural Benchmark Speech + Image + Text 1.0 (test) | AgriGPT-Omni | Acc (CN)78 | 4 | 1mo ago |