| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| NQ | UnifiedQA-3b | QA-F137.6 | 16 | 4d ago | |
| HotpotQA | UnifiedQA-3b | QA-F147 | 16 | 4d ago | |
| Curated TREC | UnifiedQA-3b | QA-F141.8 | 16 | 4d ago | |
| Web Questions | UnifiedQA-3b | QA-F148.1 | 16 | 4d ago | |
| HotpotQA (HQA) (test) | Hash-RAG | Exact Match0.311 | 10 | 4d ago | |
| TriviaQA (TQA) (test) | Hash-RAG | EM64.5 | 10 | 4d ago | |
| HotPotQA top 1000 samples (test) | BIDER | F138.6 | 10 | 4d ago | |
| TriviaQA (TQA) top 1000 samples (test) | BIDER | EM52.3 | 10 | 4d ago | |
| NaturalQuestions (NQ) top 1000 samples (test) | BIDER | Exact Match40.3 | 10 | 4d ago | |
| WebQuestions | GPT-4+RFKG-CoT | Accuracy78.2 | 8 | 4d ago | |
| ELI5 | DPR | R-L20.75 | 8 | 4d ago | |
| TriviaQA | MindRef | EM72.94 | 8 | 4d ago | |
| SituatedQA Nq=300 | Gemini3Pro | Accuracy44.6 | 6 | 4d ago | |
| AmbigQA Nq=300 | Gemini3Pro | Acc0.473 | 6 | 4d ago | |
| Federated Dataset 1 unseen tasks (test) | FedDPA-T | AVG Score78.76 | 4 | 4d ago |