| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TyDiQA, MMLU, BBH | SEED | TyDiQA Score57.6 | 48 | 16d ago | |
| BBH | Accuracy (BBH)66.2 | 24 | 19d ago | ||
| Dolly-15K alpha=5.0 | FedHDS | Rouge-L35.79 | 22 | 3mo ago | |
| Dolly-15K alpha=0.5 | Coreset-Cent | Rouge-L35.48 | 22 | 3mo ago | |
| Natural Instructions Meta Non-IID | Coreset-Cent | Rouge-L34.81 | 22 | 3mo ago | |
| UNI | ARMADA | RougeL34.53 | 21 | 2mo ago | |
| SelfInst | ARMADA | ROUGE-L21.31 | 21 | 2mo ago | |
| SNI | ARMADA | RougeL29.63 | 21 | 2mo ago | |
| Dolly | LLaMA-3.1-8B | RougeL35.34 | 21 | 2mo ago | |
| TyDiQA | TACS | Accuracy67.41 | 20 | 21d ago | |
| CoT | ADG | Reasoning Score70.55 | 20 | 1mo ago | |
| WizardLM | ADG | Reasoning Score75.07 | 20 | 1mo ago | |
| Alpaca GPT4 | ADG | Reasoning75.43 | 20 | 1mo ago | |
| Instruction Tuning Datasets 1.0 (train test) | K-Center-Greedy | Model Performance1.45 | 20 | 3mo ago | |
| Vicuna | ARMADA | RougeL Score18.73 | 19 | 22d ago | |
| Alpaca instruction-tuning 52k | GRADFILTERING | Pairwise Winning Score116 | 19 | 3mo ago | |
| IT Evaluation Suite MMLU, BBH, GSM, TydiQA, CodeX, AE | Alpaca-GPT4 | MMLU55.7 | 18 | 3mo ago | |
| Data Mix | GRADIENTSPACE | Accuracy59.1 | 16 | 3mo ago | |
| MT-Bench | S2FT | Score5.89 | 8 | 22d ago | |
| Magicoder HumanEval | FLOW | Stability50.84 | 7 | 26d ago | |
| FLAN subset Average (test) | FedRouter* | ROUGE-156.7 | 7 | 2mo ago | |
| FLAN All (test) | FedRouter* | ROUGE-157.5 | 7 | 2mo ago | |
| FLAN Dual (test) | FedRouter* | ROUGE-156.3 | 7 | 2mo ago | |
| FLAN Single (test) | FedRouter* | ROUGE-156.2 | 7 | 2mo ago | |
| AlpacaEval 2.0 (test) | Win Rate (LC)11.49 | 7 | 3mo ago |