| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | Alpaca | Speedup (x)5.27 | 111 | |
| Language Modeling | Alpaca | Perplexity3.22 | 61 | |
| Detection Efficiency | Alpaca OnlyTarget Long (malicious) | ATGR8.378 | 56 | |
| Detection Efficiency | Alpaca OnlyTarget Long (benign) | ATGR6.115 | 56 | |
| Targeted Attack Detection | Alpaca OnlyTarget Short | TPR100 | 56 | |
| Targeted Attack Detection | Alpaca OnlyTarget Medium | TPR100 | 56 | |
| Offline Synthetic Data Generation | Alpaca | Generation Time (s)68.8 | 44 | |
| Targeted Attack Detection | Alpaca AddTarget Medium | TPR100 | 35 | |
| Chatbot workload | Alpaca | Average PTLA (s/token)0.3 | 28 | |
| Prompt injection attack detection | Alpaca | TPR100 | 28 | |
| Instruction Following | Alpaca Finance | Average Length2.78 | 22 | |
| LLM Inference | Alpaca | Speedup2.95 | 21 | |
| Safety defense against harmful fine-tuning attacks | Alpaca harmful subset (test) | Harmful Score26.6 | 21 | |
| Instruction Tuning | Alpaca GPT4 | Reasoning75.43 | 20 | |
| Conversational Ability | Alpaca (test) | Alpaca LC Win Rate71.87 | 20 | |
| Instruction Tuning | Alpaca instruction-tuning 52k | Pairwise Winning Score116 | 19 | |
| Abnormal Behavior Detection | Alpaca GPT4 (test) | Accuracy100 | 17 | |
| Long-form reasoning | Alpaca | Avg LogProb per Answer-1.5772 | 14 | |
| Prompt Recovery | Alpaca | BLEU-143.24 | 14 | |
| Instruction Following | Alpaca instruction-following (test) | PPL3.85 | 12 | |
| Faithfulness Measurement | Alpaca | BLEU0.601 | 12 | |
| Instruction Following | Alpaca (test) | Kendall's Tau4.96 | 11 | |
| Bit-flip Inference Cost Attack | Alpaca (test) | Avg Length (Original)1,117 | 10 | |
| Fine-tuning Robustness | Alpaca Dataset | FSR100 | 10 | |
| Attack Success Rate | Alpaca | ASR (Alpaca)0 | 8 |