| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| 8 Vision Tasks Low Confidence setting (test) | Absolute Accuracy94 | 32 | 1mo ago | ||
| 8 Vision Tasks Norm Mismatch setting (test) | Absolute Accuracy93.3 | 32 | 1mo ago | ||
| 20-task Model Merging Benchmark (14-task + EMNIST, CIFAR10, Food101, FashionMNIST, RenderedSST2, KMNIST) | Fine-tuned | Avg Absolute Accuracy94.7 | 30 | 1mo ago | |
| TA-8 (test) | Accuracy94.3 | 28 | 1mo ago | ||
| Vision Benchmark 20-task | Average Accuracy94.7 | 24 | 1mo ago | ||
| 8-task vision benchmark | Average Accuracy95.8 | 24 | 1mo ago | ||
| 8-task vision suite | TALL-Masks | Average Performance94.3 | 14 | 1mo ago | |
| MNIST multi-task (test) | MuDSC_Zip | Accuracy94.62 | 9 | 1mo ago | |
| TALL-20 (test) | Accuracy93.5 | 8 | 1mo ago | ||
| TALL-14 (test) | Accuracy93.4 | 8 | 1mo ago | ||
| Multi-Fashion MNIST (test) | DSelect-k | Accuracy 183.78 | 7 | 1mo ago | |
| Multi-MNIST (test) | Task 1 Accuracy92.61 | 7 | 1mo ago |