| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Natural Questions (NQ) (test) | FiE | Exact Match (EM)58.4 | 134 | 3mo ago | |
| WEBQUESTIONS (test) | Memory Networks | F1 Score42.2 | 27 | 3mo ago | |
| DTA | Llama3.3-LightRAG-mix | SS Score98.11 | 10 | 1d ago | |
| AIME 2025 (test) | GRPO | Accuracy70 | 9 | 3mo ago | |
| MATH500 (test) | GRPO | Accuracy0.94 | 9 | 3mo ago | |
| OpenQA | Trajectory | Accuracy88.2 | 8 | 3mo ago | |
| BGB (test) | Gemma 3 (12B) | Factual Correctness (%)76.4 | 8 | 3mo ago | |
| LegalMC4 (test) | LLM Factual Correctness77.2 | 8 | 3mo ago | ||
| AudioCaps-QA | MiDashengLM | FENSE Score54.31 | 3 | 2mo ago | |
| MusicQA | MiDashengLM | FENSE Score62.35 | 3 | 2mo ago |