| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Bolmo Evaluation Suite GenQA 7B | Llama 3.1 70B | GenQA Average81.6 | 29 | 3d ago | |
| MsMARCO (test) | Match-LSTM | ROUGE Score40.7 | 18 | 3d ago | |
| MsMARCO (dev) | RAG | ROUGE Score57.2 | 11 | 3d ago | |
| Lu Xun's essay collections | CharacterBot | Content Score3.758 | 10 | 3d ago | |
| Amazon (test) | Prior-Aug | EM57.99 | 8 | 3d ago | |
| Reddit (test) | EM61.19 | 8 | 3d ago | ||
| BioASQ (test) | SWEP | EM43.01 | 8 | 3d ago | |
| NYT (test) | SWEP | EM76.42 | 8 | 3d ago | |
| Wiki (test) | SWEP | EM73.34 | 8 | 3d ago | |
| DriveLM (test) | DriveLM-Agent | BLEU-453.09 | 5 | 3d ago | |
| SQuAD | Blended RAG | EM57.63 | 3 | 3d ago |