| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | Alpaca | Speedup (x)5.27 | 173 | |
| Language Modeling | Alpaca | Perplexity3.22 | 61 | |
| LLM Inference | Alpaca | Speedup5.56 | 57 | |
| Detection Efficiency | Alpaca OnlyTarget Long (malicious) | ATGR8.378 | 56 | |
| Detection Efficiency | Alpaca OnlyTarget Long (benign) | ATGR6.115 | 56 | |
| Targeted Attack Detection | Alpaca OnlyTarget Short | TPR100 | 56 | |
| Targeted Attack Detection | Alpaca OnlyTarget Medium | TPR100 | 56 | |
| Instruction Following | Alpaca | Average Accepted Length6.27 | 51 | |
| Offline Synthetic Data Generation | Alpaca | Generation Time (s)68.8 | 44 | |
| Safety Evaluation | Alpaca | HRR0 | 38 | |
| Instruction Following | Alpaca SFT Small Prompts | Self-BLEU46.6 | 36 | |
| Creative Writing | Alpaca SFT Short Stories | Self-BLEU (Diversity)3.7 | 36 | |
| Instruction following and safety evaluation | Alpaca | BRT Score52.2 | 36 | |
| Targeted Attack Detection | Alpaca AddTarget Medium | TPR100 | 35 | |
| Instruction Following | Alpaca clean (test) | F1 Score79.81 | 32 | |
| Long-form Generation | Alpaca | Perplexity (PPL)2.4268 | 30 | |
| Instruction Following | Alpaca poisoned (test) | F1 Score99.93 | 28 | |
| Chatbot workload | Alpaca | Average PTLA (s/token)0.3 | 28 | |
| Prompt injection attack detection | Alpaca | TPR100 | 28 | |
| Answer Accuracy | Alpaca | BRT Accuracy40.6 | 26 | |
| MMLU Evaluation | Alpaca | Accuracy32.26 | 24 | |
| LLM Unlearning | Virtual-Alpaca | Forget Rate44 | 24 | |
| Instruction Following | Alpaca Finance | Average Length2.78 | 22 | |
| Instruction Following | Alpaca (test) | SR Score3.15 | 21 | |
| Safety defense against harmful fine-tuning attacks | Alpaca harmful subset (test) | Harmful Score26.6 | 21 |