| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Refusal | Llama-3-8B n≈200 | ASR0 | 42 | |
| Jailbreak Attack Transferability | Llama-3-8b-Instruct finetuned variants v1 (test) | TSR51.2 | 16 | |
| LLM Inference Performance | LLaMA-3 8B | TTFT (ms)56.03 | 12 | |
| Matrix Multiplication Latency | Llama-3 70B | Kernel Latency (µs)293.82 | 8 | |
| Matrix Multiplication Latency | Llama-3 8B | Kernel-level latency (µs)152.69 | 8 | |
| Watermark Detection | Llama-3-8B Translate perturbation, 30 tokens 1.0 (test) | Mean P0.13 | 6 | |
| Watermark Detection Robustness | Llama-3-8B GPT-4o Paraphrase, 150 Tokens | Mean P0.26 | 6 | |
| Watermark Detection Robustness | Llama-3-8B GPT-4o Paraphrase, 30 Tokens | Mean P29 | 6 | |
| Watermark Detection Robustness | Llama-3-8B Swap 50%, 30 Tokens | Mean P25 | 6 | |
| LLM Jailbreaking | Llama-3-8B-Instruct | SRF1 | 4 | |
| Adversarial Attack | Llama-3-70B successful attacks | Unique Queries Count1,321 | 3 | |
| Adversarial Attack Diversity Analysis | Llama-3-70B | Average Attack Similarity35.2 | 3 |