| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HotpotQA | CDA-M | Reliability Score (RS)51.8 | 52 | 1mo ago | |
| LOFT | MiniLM | NQ Score100 | 42 | 1mo ago | |
| All Datasets Aggregated | GuarantRAG | Average Performance Score76.6 | 40 | 8d ago | |
| ICR2 | Phi-3-7B-128K | NQ Score87 | 37 | 1mo ago | |
| NQ | ReFeedL | Accuracy77.1 | 23 | 1mo ago | |
| Spec-Bench RAG | SpecBound | CR5.48 | 21 | 3d ago | |
| Average | FastInsight | Win Rate65 | 18 | 1mo ago | |
| UltraDomain mix | FastInsight | Win Rate76.2 | 18 | 1mo ago | |
| UltraDomain agriculture | FastInsight | Win Rate95 | 18 | 1mo ago | |
| BSARD-G | FastInsight | Win Rate85.6 | 18 | 1mo ago | |
| LOFT and ICR2 Combined | GPT-4-turbo | Overall Score74 | 18 | 1mo ago | |
| Long-context benchmarks | RAG Score (8k Context)53.7 | 16 | 1mo ago | ||
| ACL-OCL | FastInsight | Win Rate58.2 | 16 | 1mo ago | |
| Retrieval-Augmented Generation | Performance at 8k Context Length65.4 | 13 | 1mo ago | ||
| Legal Consultation (test) | Legal-DC | Recall78.02 | 12 | 1mo ago | |
| News Articles | DA-RAG | Comprehensiveness96.7 | 12 | 1mo ago | |
| Mix | DA-RAG | Comprehensiveness95.9 | 12 | 1mo ago | |
| Agriculture | DA-RAG | Comprehensiveness97.6 | 12 | 1mo ago | |
| TheoremQA | Ours | Accuracy66.3 | 12 | 1mo ago | |
| CHAMP | ReFeedL | Accuracy45.2 | 12 | 1mo ago | |
| SynthWiki 20 documents | LongLLMLingua-rk + Cal. | Mean Score95.75 | 12 | 1mo ago | |
| SynthWiki 10 documents | LongLLMLingua-rk + Cal. | Average Score94.44 | 12 | 1mo ago | |
| NaturalQuestion 20 documents | Attention sorting | Average Score0.6289 | 12 | 1mo ago | |
| NaturalQuestion 10 documents | LongLLMLingua-rk + Cal. | Average Score66.17 | 12 | 1mo ago | |
| IFS-REL | NaiveRAG | Indexing Time (mins)2 | 11 | 1mo ago |