| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MM-VET | ECSO | REC39.5 | 12 | 4d ago | |
| Aggregate Suite PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c | Average Score69 | 10 | 4d ago | ||
| Reasoning, Knowledge, and Biomedicine combined datasets (test) | Reasoning | Average Score60.47 | 9 | 3d ago | |
| AGIEval | Accuracy70.22 | 8 | 4d ago | ||
| Instruction Tuning Suite (BIG-bench Hard, MMLU, TyDi QA, MGSM) | Flan-PaLM 2 (L) | Average Score74.1 | 4 | 4d ago |