| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Attention Operator Throughput | Llama2 7B (32 Q-heads/32 KV-heads/128 Head-dimension) | Attention TFLOPS207.3 | 30 | |
| Jailbreak attack | Llama2-7b five finetuned variants | Average ASR0 | 16 | |
| Accuracy | LLaMA2-7B zero-shot | Zero-Shot Accuracy67.18 | 16 | |
| Targeted Refusal | Llama2-7B Generation Evaluation Set | Completion Accuracy (CA)93.14 | 15 | |
| Sentiment Steering | Llama2-7B Generation Evaluation Set | Accuracy (CA)90.15 | 15 | |
| Multi-bit Watermarking | LLaMA2-7B 300 tokens (test) | Perplexity7.0486 | 14 | |
| Inference Efficiency | LLaMA2-7B 12/128 tokens | Latency1.889 | 13 | |
| Jailbreak Attack | llama2-7b v1 (pretrained) | ASR0 | 13 | |
| Watermark Detection | Llama2-7B Copy-paste attack | F1 Score97.8 | 11 | |
| LLM fingerprinting | Llama2 7B | AUC100 | 10 | |
| Watermark Detection | Llama2-7B Paragraphing | F1 Score91.6 | 8 | |
| Watermark Detection | Llama2-7B Synonymous substitution | F1 Score98.5 | 8 | |
| Watermark Detection | Llama2-7B Clean | F1 Score100 | 8 | |
| LLM Inference | LLaMA2 7B | Latency (ms)1,052.24 | 4 | |
| LLM Training | Llama2-70B (64 x H100-8) | Iteration Time (s)7.8 | 4 | |
| LLM Training | Llama2 7B | Iteration Time (s)1.4 | 4 | |
| Quantization | LLaMA2-7B | Averaged Quantization Time (s)24 | 4 | |
| Text Perplexity Evaluation | Llama2 7B | PPL (Trial 1)7.626 | 3 | |
| Watermark Detection (Paraphrasing Attack) | Llama2-7B | F1 Score91.8 | 3 | |
| Inference Latency | LLaMA2 70B | Latency (ms)1,450 | 3 | |
| Knowledge Editing | LLaMA2-13B Sequential batch-editing setup | S Score84.7 | 3 | |
| LLM Training | Llama2-7B tpu-v5p-512 | Iteration Time (s)2.5 | 3 | |
| LLM Training | Llama2 70B (tpu-v5p-1024) | Iteration Time (s)11.6 | 2 | |
| LLM Training | Llama2 70B 64 x Trainium2-16 | Iteration Time (s)11.2 | 1 | |
| LLM Training | Llama2 7B 64 x Trainium2-16 | Iteration Time (s)1.2 | 1 |