| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Natural Questions (NQ) (test) | FiE | Exact Match (EM)58.4 | 134 | 1mo ago | |
| WEBQUESTIONS (test) | Memory Networks | F1 Score42.2 | 27 | 1mo ago | |
| AIME 2025 (test) | GRPO | Accuracy70 | 9 | 1mo ago | |
| MATH500 (test) | GRPO | Accuracy0.94 | 9 | 1mo ago | |
| OpenQA | Trajectory | Accuracy88.2 | 8 | 1mo ago | |
| BGB (test) | Gemma 3 (12B) | Factual Correctness (%)76.4 | 8 | 1mo ago | |
| LegalMC4 (test) | LLM Factual Correctness77.2 | 8 | 1mo ago | ||
| AudioCaps-QA | MiDashengLM | FENSE Score54.31 | 3 | 22d ago | |
| MusicQA | MiDashengLM | FENSE Score62.35 | 3 | 22d ago |