| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Attention Operator Throughput | Llama2 7B (32 Q-heads/32 KV-heads/128 Head-dimension) | Attention TFLOPS207.3 | 42 | |
| LLM Inference | LLaMA2 7B | TTFT (ms)11.09 | 33 | |
| Distributed Training | LLaMA2 13B | Number of Recomputed Layers0 | 21 | |
| Jailbreak attack | Llama2-7b five finetuned variants | Average ASR0 | 16 | |
| Accuracy | LLaMA2-7B zero-shot | Zero-Shot Accuracy67.18 | 16 | |
| Targeted Refusal | Llama2-7B Generation Evaluation Set | Completion Accuracy (CA)93.14 | 15 | |
| Sentiment Steering | Llama2-7B Generation Evaluation Set | Accuracy (CA)90.15 | 15 | |
| Multi-bit Watermarking | LLaMA2-7B 300 tokens (test) | Perplexity7.0486 | 14 | |
| Inference Efficiency | LLaMA2-7B 12/128 tokens | Latency1.889 | 13 | |
| Jailbreak Attack | llama2-7b v1 (pretrained) | ASR0 | 13 | |
| Watermark Detection | Llama2-7B Copy-paste attack | F1 Score97.8 | 11 | |
| LLM fingerprinting | Llama2 7B | AUC100 | 10 | |
| Watermark Detection | Llama2-7B Paragraphing | F1 Score91.6 | 8 | |
| Watermark Detection | Llama2-7B Synonymous substitution | F1 Score98.5 | 8 | |
| Watermark Detection | Llama2-7B Clean | F1 Score100 | 8 | |
| Efficiency Analysis | LLaMA2-7B | Memory Usage (GB)1.34 | 7 | |
| Jailbreak Defense | LLaMA2-7B Adaptive AutoDAN-T attack | ASR17 | 6 | |
| Jailbreak Defense | LLaMA2-7B Adaptive PAIR attack | Attack Success Rate (ASR)0 | 6 | |
| LLM Jailbreaking | Llama2-DA | SRF57 | 4 | |
| Relative Pos. Attention | Llama2-7b (q=32, k=32) (1k) | TFLOPS (Relative Pos. Attention)114.85 | 4 | |
| Share Question Mask Attention | Llama2-7b (q=32, k=32) (1k) | TFLOPS (Share QK Mask Attention, 1k)39.81 | 4 | |
| Global Sliding Window Attention | Llama2-7b q=32, k=32 (1k) | TFLOPS67.36 | 4 | |
| PrefixLM Attention | Llama2-7b (q=32, k=32) (8k) | TFLOPS (PrefixLM Attention)163.7 | 4 | |
| LLM Inference | LLaMA2-7B 1,024 tokens | Latency (ms)20 | 4 | |
| LLM Training | Llama2-70B (64 x H100-8) | Iteration Time (s)7.8 | 4 |