| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TriviaQA | Agentic-R | Exact Match69.02 | 39 | 3d ago | |
| NQ | SkillOrchestra+ | Exact Match (EM)54.8 | 36 | 3d ago | |
| PopQA | Search-R1 | EM44.8 | 36 | 3d ago | |
| General QA NQ, TriviaQA, PopQA | Search-R1-GRPO + LLDS | NQ Accuracy51.8 | 34 | 3d ago | |
| TriviaQA (test val) | DeepControl | EM68.2 | 24 | 3d ago | |
| Natural Questions (NQ) (test val) | DeepControl | EM55.8 | 24 | 3d ago | |
| PopQA | SkillOrchestra+ | Accuracy48.8 | 18 | 3d ago | |
| TriviaQA | SkillOrchestra+ | Accuracy80.2 | 18 | 3d ago | |
| TriviaQA | Search-R1 | EM64.4 | 18 | 3d ago | |
| NQ (Natural Questions) | Search-R1 | EM46.1 | 18 | 3d ago | |
| PopQA out-of-domain (val test) | Search-R2 | Exact Match (EM)50.1 | 15 | 3d ago | |
| TriviaQA out-of-domain (val test) | Search-R2 | EM70.9 | 15 | 3d ago | |
| NQ (Natural Questions) in-domain (val/test) | Search-R2 | Exact Match50.9 | 15 | 3d ago | |
| TriviaQA (test) | Search-R1 + EKA | F166.1 | 11 | 3d ago | |
| PopQA (test) | Workflow-R1-Search | EM49.3 | 10 | 3d ago | |
| TriviaQA (test) | Workflow-R1-Search | EM73.3 | 10 | 3d ago | |
| PopQA (test val) | DeepControl | Exact Match (EM)52.1 | 4 | 3d ago | |
| MMLU-Pro (test) | GEPA | Mean Accuracy79.55 | 4 | 3d ago | |
| MMLU-Pro (test) | ETGPO | Optimization Token Usage595 | 3 | 3d ago |