| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LoCoMo | BLEU48.7 | 24 | 20d ago | ||
| LiC Overall | Qwen2.5-7B | LiC Score76.4 | 21 | 1mo ago | |
| Average across all benchmarks | LaVer | Average Score59.94 | 21 | 8d ago | |
| MMG2Skill-Bench Average | MMG2Skill | Success Rate69.64 | 18 | 1d ago | |
| Must-C & Spoken-SQuAD | contr-cos-all + giga | Normalized Average1.1418 | 15 | 3mo ago | |
| Aggregated All Benchmarks | DiReCT | Average Score40.3 | 12 | 2d ago | |
| Average (HellaSwag, PiQA, OBQA, COPA, LogiQA, WinoG, SciQ, ARC-E, Lambada) | DoGraph | Accuracy42.5 | 7 | 1mo ago | |
| CoP-QA-F | Talk2DM | AC Score97.6 | 6 | 3mo ago | |
| AfroNLG (test) | Cheetah | AfroNLG Score14.25 | 5 | 3mo ago | |
| Aggregate 9 benchmarks | Qwen2.5-Dense2MoE | Average Score58.23 | 4 | 7d ago | |
| HorizonSuite | HorizonForge | FID33.19 | 4 | 3mo ago | |
| Aggregate General, Math, Coding | NBDiff-7B-BASE | Average Accuracy65.3 | 4 | 3mo ago | |
| Video Quality User Study | ContI2Video | Preference Count64 | 3 | 2mo ago |