| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ELI5 (test) | RBG | ROUGE-L27.13 | 54 | 3d ago | |
| ELI5 | SearChain | ROUGE-L25.57 | 27 | 4d ago | |
| Novel GraphRAG-Bench | A-RAG (Full) | LLM-Acc85.3 | 20 | 4d ago | |
| GraphRAG-Bench Med | A-RAG (Full) | LLM Accuracy93.1 | 20 | 4d ago | |
| ASQA | Amber | str-em51.3 | 15 | 4d ago | |
| Biography | EWE | VeriScore F149.7 | 14 | 4d ago | |
| AlpacaFact | EWE | VeriScore F166.9 | 14 | 4d ago | |
| Fava | EWE | VeriScore F161 | 14 | 4d ago | |
| LongFact | EWE | VeriScore F175.9 | 14 | 4d ago | |
| Long-form QA (test) | ALARM | Win Rate vs. Holistic Reward61.7 | 13 | 4d ago | |
| ELI5 (val) | F131.5 | 11 | 3d ago | ||
| ELI5 KILT (test) | RT + C-REALM | F125.4 | 8 | 3d ago | |
| ALCE LFQA | ATTR. FIRST_CoT | ROUGE-L38.6 | 7 | 4d ago | |
| ELI5 standard original | Fourier-BART-FP | RL Score26.9 | 5 | 4d ago | |
| GroundBench (test) | RHIO-13B | Faithfulness (Full)87.5 | 4 | 4d ago | |
| LFQA | AIS (Decomposition)90.9 | 4 | 4d ago | ||
| KILT ELI5 (test) | NTP + NSP | Retrieval Score36.3 | 4 | 4d ago | |
| HQ2A | Error-Informed Refinement (EIR) | Comprehensiveness100 | 3 | 4d ago | |
| LFQA (test) | ATTR. FIRST | R-L38.2 | 3 | 4d ago | |
| KILT ELI5 (dev test) | KID | RL Score26.3 | 3 | 4d ago | |
| MS MARCO (evaluation) | RBG | Fluency2.7 | 2 | 4d ago |