| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AGIEval | Accuracy70.22 | 29 | 1mo ago | ||
| Aggregate Benchmarks | GOLF | Average Score69.26 | 12 | 1mo ago | |
| MM-VET | ECSO | REC39.5 | 12 | 1mo ago | |
| Aggregate Suite PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c | Average Score69 | 10 | 1mo ago | ||
| Reasoning, Knowledge, and Biomedicine combined datasets (test) | Reasoning | Average Score60.47 | 9 | 1mo ago | |
| Average Downstream Benchmark Suite | DoGraph | Average Accuracy37.9 | 7 | 8d ago | |
| Instruction Tuning Suite (BIG-bench Hard, MMLU, TyDi QA, MGSM) | Flan-PaLM 2 (L) | Average Score74.1 | 4 | 1mo ago |