| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HotpotQA | CDA-M | Reliability Score (RS)51.8 | 52 | 3d ago | |
| LOFT | MiniLM | NQ Score100 | 42 | 4d ago | |
| ICR2 | Phi-3-7B-128K | NQ Score87 | 37 | 4d ago | |
| NQ | ReFeedL | Accuracy77.1 | 23 | 4d ago | |
| Average | FastInsight | Win Rate65 | 18 | 3d ago | |
| UltraDomain mix | FastInsight | Win Rate76.2 | 18 | 3d ago | |
| UltraDomain agriculture | FastInsight | Win Rate95 | 18 | 3d ago | |
| BSARD-G | FastInsight | Win Rate85.6 | 18 | 3d ago | |
| LOFT and ICR2 Combined | GPT-4-turbo | Overall Score74 | 18 | 4d ago | |
| ACL-OCL | FastInsight | Win Rate58.2 | 16 | 3d ago | |
| News Articles | DA-RAG | Comprehensiveness96.7 | 12 | 4d ago | |
| Mix | DA-RAG | Comprehensiveness95.9 | 12 | 4d ago | |
| Agriculture | DA-RAG | Comprehensiveness97.6 | 12 | 4d ago | |
| TheoremQA | Ours | Accuracy66.3 | 12 | 4d ago | |
| CHAMP | ReFeedL | Accuracy45.2 | 12 | 4d ago | |
| SynthWiki 20 documents | LongLLMLingua-rk + Cal. | Mean Score95.75 | 12 | 4d ago | |
| SynthWiki 10 documents | LongLLMLingua-rk + Cal. | Average Score94.44 | 12 | 4d ago | |
| NaturalQuestion 20 documents | Attention sorting | Average Score0.6289 | 12 | 4d ago | |
| NaturalQuestion 10 documents | LongLLMLingua-rk + Cal. | Average Score66.17 | 12 | 4d ago | |
| WoW | ConsJudge | LLM Score88.87 | 11 | 4d ago | |
| MARCOQA | ConsJudge | LLM Score88.25 | 11 | 4d ago | |
| ASQA | ConsJudge | str-EM42.44 | 11 | 3d ago | |
| TriviaQA | ConsJudge | Accuracy88.26 | 11 | 4d ago | |
| RAG-Bench | RAAT | F1 (Golden Only)87.15 | 11 | 4d ago | |
| UltraDomain Pathology (test) | Hyper-KGGen+ | Comprehension92.35 | 9 | 4d ago |