| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | Alpaca | Speedup (x)4.13 | 63 | |
| Language Modeling | Alpaca | Perplexity3.22 | 31 | |
| LLM Inference | Alpaca | Speedup2.95 | 21 | |
| Safety defense against harmful fine-tuning attacks | Alpaca harmful subset (test) | Harmful Score26.6 | 21 | |
| Conversational Ability | Alpaca (test) | Alpaca LC Win Rate71.87 | 20 | |
| Instruction Tuning | Alpaca instruction-tuning 52k | Pairwise Winning Score116 | 19 | |
| Long-form reasoning | Alpaca | Avg LogProb per Answer-1.5772 | 14 | |
| Prompt Recovery | Alpaca | BLEU-143.24 | 14 | |
| Instruction Following | Alpaca instruction-following (test) | PPL3.85 | 12 | |
| Faithfulness Measurement | Alpaca | BLEU0.601 | 12 | |
| Instruction Following | Alpaca (test) | Kendall's Tau4.96 | 11 | |
| Bit-flip Inference Cost Attack | Alpaca (test) | Avg Length (Original)1,117 | 10 | |
| Fine-tuning Robustness | Alpaca Dataset | FSR100 | 10 | |
| Adaptive Care Policy Learning | ALPACA 1000 simulated patient rollouts | Cumulative Reward3.38 | 7 | |
| LLM Routing | Alpaca In-Domain | AUROC0.7202 | 7 | |
| Inference Cost Attack | Alpaca Vicuna-7B (test) | Average Length1,874 | 6 | |
| Inference Cost Attack | Alpaca Samantha-7B (test) | Average Length1,944 | 6 | |
| Inference Cost Attack | Alpaca Llama2-7B (test) | Average Length191 | 6 | |
| Machine Unlearning | Alpaca-57k (OOD) | Delta ASR41.4 | 6 | |
| Machine Unlearning | Alpaca 57k (Seen) | Delta ASR96.7 | 6 | |
| Budgeted subset selection | Alpaca 5% retention | SUM157.162 | 6 | |
| Teacher Attribution | Alpaca | Accuracy56 | 6 | |
| Watermark Detection | Alpaca instruction-following 52K | TPR61.67 | 5 | |
| Budgeted subset selection | Alpaca 15% retention (train) | Total Sum134.25 | 5 | |
| Safety Alignment | Alpaca 7B (test) | HV Score1.2916 | 5 |