| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Must-C & Spoken-SQuAD | contr-cos-all + giga | Normalized Average1.1418 | 15 | 1mo ago | |
| Average across all benchmarks | LaVer | Average Score59.94 | 12 | 1mo ago | |
| LoCoMo | BLEU48.7 | 8 | 1mo ago | ||
| Average (HellaSwag, PiQA, OBQA, COPA, LogiQA, WinoG, SciQ, ARC-E, Lambada) | DoGraph | Accuracy42.5 | 7 | 9d ago | |
| CoP-QA-F | Talk2DM | AC Score97.6 | 6 | 1mo ago | |
| AfroNLG (test) | Cheetah | AfroNLG Score14.25 | 5 | 1mo ago | |
| HorizonSuite | HorizonForge | FID33.19 | 4 | 1mo ago | |
| Aggregate General, Math, Coding | NBDiff-7B-BASE | Average Accuracy65.3 | 4 | 1mo ago | |
| Video Quality User Study | ContI2Video | Preference Count64 | 3 | 20d ago |